7
$\begingroup$

I might be mistaken, but based on my current understanding, autoencoders typically consist of two components:

  • encoder $f_{\theta}(x) = z$
  • decoder $g_\phi(z)=\hat{x}$

The goal during training is to make the reconstructed output $\hat{x}$ as similar as possible to the original input $x$ using some reconstruction loss function.

Regardless of the specific type of autoencoder, the parameters of both the encoder and decoder are trained jointly on the same input data. As a result, the latent representation $z$ becomes tightly coupled with the decoder. This means that $z$ only has meaning or usefulness in the context of the decoder.

In other words, we can only interpret $z$ as representing a sample from the input distribution $\mathcal{D}$ if it is used together with the decoder $g_{\phi}$. Without the decoder, $z$ by itself does not necessarily carry any representation for the distribution values.

Can anyone correct my understanding because autoencoders are widely used and verified.

Thank you!

$\endgroup$
0

3 Answers 3

8
$\begingroup$

Yes, you are correct. Auto-encoders are useful for generalizing the input distribution. Encoders map the sample from the input distribution to the latent space. We require the decoders to generate a near-correct image (Reconstruction Error is sure to happen).

$\endgroup$
8
$\begingroup$

The latent space, z in the question, does contain enough information to approximate your sample data up to an error-threshold. But transformed. You need the specific decoder to recover your data, that undoes this transformation. Like you need to unzip a compressed file with the appropriate decompressor.

The strucutre of that latent space depends on your model. And, depending on the "loss landscape" even the same model can "converge" (in the practical sense, where you stop when it is close enough) to different solutions. E.g. when you have local minima.

So yes, to do something useful with an autoencoder, you need both parts.

$\endgroup$
7
$\begingroup$

You can do something useful with just the encoder part, so you can throw the the decoder away after training. It is doing dimension reduction.

A 2-dimensional auto-encoder is useful for visualization of higher-dimensional data, in a similar way to plotting the first two components from doing PCA. However it often gives a different insight into the data.

z can also be used as input into another model. E.g. you might use a 768 --> 8 --> 768 autoencoder on MNIST data, then feed that 8-dimensional z into another neural net that will predict which digit is in the image. Again the g() decoder can be throw away.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.