What distinguishes VAEs from other autoencoders is the unique way they encode latent space and the different use cases to which their probabilistic encoding can be applied.
Unlike most autoencoders, which are deterministic models that encode a single vector of discrete latent variables, VAES are probabilistic models. VAEs encode latent variables of training data not as a fixed discrete value z, but as a continuous range of possibilities expressed as a probability distribution p(z).
In Bayesian statistics, this learned range of possibilities for the latent variable is called the prior distribution. In variational inference, the generative process of synthesizing new data points, this prior distribution is used to calculate the posterior distribution, p(z|x). In other words, the value of observable variables x, given a value for latent variable z.
For each latent attribute of training data, VAEs encode two different latent vectors: a vector of means, “μ,” and a vector of standard deviations, “σ.” In essence, these two vectors represent the range of possibilities for each latent variable and the expected variance within each range of possibilities.
By randomly sampling from within this range of encoded possibilities, VAEs can synthesize new data samples that, while unique and original unto themselves, resemble the original training data. Though relatively intuitive in principle, this methodology requires further adaptations to standard autoencoder methodology to be put into practice.
To explain this ability of VAEs, we'll review the following concepts:
- Reconstruction loss
- Kullback-Leibler (KL) divergence
- Evidence lower bound (ELBO)
- The reparameterization trick