I might be mistaken, but based on my current understanding, autoencoders typically consist of two components:
- encoder $f_{\theta}(x) = z$
- decoder $g_\phi(z)=\hat{x}$
The goal during training is to make the reconstructed output $\hat{x}$ as similar as possible to the original input $x$ using some reconstruction loss function.
Regardless of the specific type of autoencoder, the parameters of both the encoder and decoder are trained jointly on the same input data. As a result, the latent representation $z$ becomes tightly coupled with the decoder. This means that $z$ only has meaning or usefulness in the context of the decoder.
In other words, we can only interpret $z$ as representing a sample from the input distribution $\mathcal{D}$ if it is used together with the decoder $g_{\phi}$. Without the decoder, $z$ by itself does not necessarily carry any representation for the distribution values.
Can anyone correct my understanding because autoencoders are widely used and verified.
Thank you!