An autoencoder is a neural network that learns to compress (encode) input data into a compact latent representation (bottleneck), then reconstruct (decode) the original from that representation. The reconstruction loss forces the network to capture the most essential features. Variational Autoencoders (VAE) extend this with probabilistic encoding: instead of a single point in latent space, each input maps to a distribution (mean + variance). VAEs enable controlled generation of new data — a predecessor to diffusion models and a key component in modern generative AI.
Real-life analogy: The summary
An autoencoder is like a journalist who summarises a 10,000-word report into a 100-word summary (encoding), then hands it to a colleague who reconstructs the full report from the summary (decoding). The quality of the summary is measured by how accurately the colleague can recreate the original. The journalist learns to include only the most important information — discarding noise. VAE adds: instead of writing one specific summary, the journalist writes a distribution of possible summaries.
Standard autoencoder
Convolutional autoencoder for image compression
import torch
import torch.nn as nn
import torch.nn.functional as F
class ConvAutoencoder(nn.Module):
"""Convolutional autoencoder for 28x28 images (MNIST-like)."""
def __init__(self, latent_dim=32):
super().__init__()
# ── ENCODER: image → latent vector ──
self.encoder = nn.Sequential(
nn.Conv2d(1, 32, 3, stride=2, padding=1), # 28×28 → 14×14
nn.ReLU(),
nn.Conv2d(32, 64, 3, stride=2, padding=1), # 14×14 → 7×7
nn.ReLU(),
nn.Conv2d(64, 128, 3, stride=2, padding=1), # 7×7 → 4×4
nn.ReLU(),
nn.Flatten(), # 128×4×4 = 2048
nn.Linear(2048, latent_dim) # 2048 → 32
)
# ── DECODER: latent vector → image ──
self.decoder = nn.Sequential(
nn.Linear(latent_dim, 2048),
nn.ReLU(),
nn.Unflatten(1, (128, 4, 4)),
nn.ConvTranspose2d(128, 64, 3, stride=2, padding=1, output_padding=0), # 4→7
nn.ReLU(),
nn.ConvTranspose2d(64, 32, 3, stride=2, padding=1, output_padding=1), # 7→14
nn.ReLU(),
nn.ConvTranspose2d(32, 1, 3, stride=2, padding=1, output_padding=1), # 14→28
nn.Sigmoid() # Output in [0,1] for pixel values
)
def forward(self, x):
z = self.encoder(x) # Compress to latent space
x_hat = self.decoder(z) # Reconstruct
return x_hat, z
def encode(self, x): return self.encoder(x)
def decode(self, z): return self.decoder(z)
ae = ConvAutoencoder(latent_dim=32)
x_batch = torch.randn(32, 1, 28, 28) # 32 grayscale 28x28 images
x_hat, z = ae(x_batch)
print(f"Input: {x_batch.shape}") # [32, 1, 28, 28]
print(f"Latent z: {z.shape}") # [32, 32] — compressed!
print(f"Reconstruction: {x_hat.shape}") # [32, 1, 28, 28]
# Training: minimise reconstruction loss
optimizer = torch.optim.Adam(ae.parameters(), lr=1e-3)
recon_loss = F.mse_loss(x_hat, x_batch) # Or F.binary_cross_entropy
print(f"Reconstruction loss: {recon_loss.item():.4f}")
# Applications of standard autoencoders:
# 1. Dimensionality reduction (latent z is a compressed representation)
# 2. Denoising (train with noisy input, clean target)
# 3. Anomaly detection (high reconstruction error = anomaly)Variational Autoencoder (VAE)
VAE ELBO loss. Term 1: reconstruction — how well decoder reproduces input from samples of latent distribution. Term 2: KL divergence — regularises latent distribution to be close to N(0,I). The KL term forces the latent space to be smooth and continuous, enabling interpolation and generation.
Variational Autoencoder with reparameterisation trick
class VAE(nn.Module):
def __init__(self, input_dim=784, hidden_dim=256, latent_dim=20):
super().__init__()
# Encoder outputs mean AND log-variance of latent distribution
self.encoder_shared = nn.Sequential(
nn.Linear(input_dim, hidden_dim), nn.ReLU(),
nn.Linear(hidden_dim, hidden_dim), nn.ReLU()
)
self.fc_mu = nn.Linear(hidden_dim, latent_dim) # Mean μ
self.fc_logvar = nn.Linear(hidden_dim, latent_dim) # Log-variance log(σ²)
self.decoder = nn.Sequential(
nn.Linear(latent_dim, hidden_dim), nn.ReLU(),
nn.Linear(hidden_dim, hidden_dim), nn.ReLU(),
nn.Linear(hidden_dim, input_dim), nn.Sigmoid()
)
def encode(self, x):
h = self.encoder_shared(x)
mu = self.fc_mu(h)
logvar = self.fc_logvar(h)
return mu, logvar
def reparameterise(self, mu, logvar):
"""
Reparameterisation trick: z = μ + ε×σ where ε ~ N(0,I)
Makes sampling differentiable so gradients can flow to encoder!
Without this trick, sampling from the distribution has no gradient.
"""
std = torch.exp(0.5 * logvar) # σ = exp(log(σ²) / 2)
eps = torch.randn_like(std) # ε ~ N(0, I)
return mu + eps * std # z ~ N(μ, σ²)
def decode(self, z):
return self.decoder(z)
def forward(self, x):
mu, logvar = self.encode(x)
z = self.reparameterise(mu, logvar)
x_hat = self.decode(z)
return x_hat, mu, logvar
def loss(self, x, x_hat, mu, logvar, beta=1.0):
recon = F.binary_cross_entropy(x_hat, x, reduction='sum')
# KL divergence: KL[N(μ,σ²) || N(0,1)] = ½Σ(μ² + σ² - log(σ²) - 1)
kld = -0.5 * torch.sum(1 + logvar - mu**2 - logvar.exp())
return recon + beta * kld # β-VAE: β>1 disentangles latent factors
vae = VAE(input_dim=784, latent_dim=20)
x = torch.randn(32, 784)
x_hat, mu, logvar = vae(x)
loss = vae.loss(x, x_hat, mu, logvar)
print(f"VAE loss: {loss.item():.2f}")
# Generate new images by sampling from prior N(0,I)
z_sample = torch.randn(16, 20) # 16 random points in latent space
generated = vae.decode(z_sample) # 16 new generated images
print(f"Generated: {generated.shape}") # [16, 784]
# Interpolation between two images
z1, _ = vae.encode(x[:1].detach()), None
z2, _ = vae.encode(x[1:2].detach()), None
# Smooth interpolation in latent space
for alpha in [0.0, 0.25, 0.5, 0.75, 1.0]:
z_interp = alpha * vae.encode(x[1:2])[0] + (1-alpha) * vae.encode(x[:1])[0]
img_interp = vae.decode(z_interp.detach())Practice questions
- What is the bottleneck layer in an autoencoder and why is it essential? (Answer: The bottleneck is the compressed latent representation — a small-dimensional layer between encoder and decoder. Its small size forces the network to discard redundant information and preserve only the most essential features. Without a bottleneck, the network could learn the identity function (copy input to output) without learning useful features.)
- Why does a standard autoencoder struggle to generate NEW data? (Answer: Standard autoencoders map inputs to specific points in latent space — not necessarily in a smooth, continuous distribution. The space between mapped points may not decode to meaningful images. You cannot sample randomly and expect good generations. VAE forces the latent space to follow N(0,I), making every point generatable.)
- What is the reparameterisation trick and why is it needed? (Answer: Sampling z ~ N(μ, σ²) is not differentiable (sampling is a stochastic operation). The trick: z = μ + ε×σ where ε ~ N(0,I). Now ε is the randomness (not a function of parameters) and the gradient can flow through μ and σ to the encoder. Enables end-to-end training with backpropagation.)
- VAE loss has two terms. What does each term encourage? (Answer: Reconstruction term: decoder should accurately reconstruct the input from the sampled z — encourages quality. KL term: encoder distribution should be close to N(0,I) — encourages smooth, well-structured latent space enabling generation. β-VAE increases KL weight to disentangle latent factors.)
- How are autoencoders used for anomaly detection? (Answer: Train autoencoder on normal data only. At test time, compute reconstruction error for each sample. Normal samples have low error (the encoder has learned to represent them). Anomalies have high error (the encoder was not trained on their patterns, compresses them poorly, decoder cannot reconstruct them). High error = anomaly.)
On LumiChats
VAEs are foundational to modern generative AI. The latent space interpolation concept (smoothly blending between two images) inspired many creative AI applications. LumiChats can help you implement denoising autoencoders for data cleaning and anomaly detection autoencoders for production systems.
Try it free