Diffusion Models ================ Diffusion models are a class of generative models that progressively transform simple noise into complex data distributions, such as images or climate fields. Intuitively, they work in two phases: 1. **Forward diffusion**: A clean signal is gradually corrupted by adding Gaussian noise, eventually transforming it into a nearly pure Gaussian distribution. 2. **Reverse denoising**: A neural network is trained to gradually remove the noise, step by step, reconstructing the original data distribution from the noisy signal. This can be visualized as follows: .. figure:: ../../images/diffusion.png :width: 80% :align: center Two original Gaussian distributions are progressively transformed into normal distribution. A denoising network then reconstructs the original distributions. The framework implements several diffusion formulations commonly used in state-of-the-art generative modeling: - **VE (Variance Exploding)** - **VP (Variance Preserving)** - **EDM (Elucidated Diffusion Models)** - **iDDPM (Improved DDPM)** These formulations are selectable via configuration and can be paired with different neural architectures. EDM Preconditioning ------------------- The EDM preconditioned model stabilizes training by standardizing the scales of inputs, outputs, and targets across varying noise levels: .. math:: D_\theta(\mathbf{x}; \sigma) = c_{\rm {skip}}(\sigma) \mathbf{x} + c_{\rm{out}}(\sigma) F_\theta\big(c_{\rm{in}}(\sigma) \mathbf{x}; c_\mathrm{noise}(\sigma)\big) Where: - :math:`\mathbf{x}=\mathbf{y}+\sigma\mathbf{n}` is the noisy input - :math:`\mathbf{y}` is the clean signal - :math:`\mathbf{n} \sim \mathcal{N}(\mathbf{0}, \mathbf{1})` is standard Gaussian noise - :math:`\sigma` is the noise level - :math:`F_\theta` is the underlying neural network Coefficients: .. math:: c_\mathrm{in}(\sigma) = 1 / (\sigma_\mathrm{data}^2 + \sigma^2)^{1/2} .. math:: c_\mathrm{skip}(\sigma) = \sigma_\mathrm{data}^2 / (\sigma_\mathrm{data}^2 + \sigma^2) .. math:: c_\mathrm{out}(\sigma) = \sigma \, \sigma_\mathrm{data} / (\sigma_\mathrm{data}^2 + \sigma^2)^{1/2} .. math:: c_\mathrm{noise}(\sigma) =1/4 \log \sigma Loss Function ------------- For each training sample, Gaussian noise :math:`\sigma \mathbf{n}` with a randomly selected noise level :math:`\sigma` is added to the image. The network is trained with weighted denoising loss: .. math:: \mathcal{L} = \mathbb{E}_{\sigma, \mathbf{y}, \mathbf{n}} \left[ \lambda(\sigma) \left\| D_{\theta}(\mathbf{y} + \sigma\mathbf{n}, \sigma) - \mathbf{y} \right\|_2^2 \right] Where :math:`\lambda(\sigma) = (\sigma^2 + \sigma_{\mathrm{data}}^2) / (\sigma \, \sigma_{\mathrm{data}})^2`. Sampling -------- High-resolution samples are generated by numerically solving the reverse-time stochastic differential equation (SDE): 1. Initialize with Gaussian noise :math:`\mathbf{x}_0 \sim \mathcal{N}(\mathbf{0}, t_0^2 \mathbf{1})` 2. For each step :math:`i` from 0 to :math:`N-1`: - Optionally add temporary noise increment - Compute denoising direction - Update latent with Euler/Heun scheme 3. Return final denoised sample Theoretical Background ---------------------- For theoretical background, see: - `DDPM `_ - `Score-Based Generative Models `_ - `EDM `_ Implementation Details ---------------------- Each diffusion formulation is implemented as a separate class with: - **Noise scheduling**: Defines :math:`\sigma(t)` progression - **Sampling methods**: Different ODE/SDE solvers - **Loss computation**: Formulation-specific weighting - **Conditioning**: Support for various conditioning strategies Configuration Example --------------------- .. code-block:: yaml diffusion: type: "EDM" sigma_data: 1.0 sigma_min: 0.002 sigma_max: 80.0 rho: 7.0 p_mean: -1.2 p_std: 1.2 sampling: steps: 40 sampler: "heun" s_churn: 40.0 s_min: 0.05 s_max: 50.0 s_noise: 1.003 Comparison of Formulations -------------------------- - **VE**: Simple, stable, good for continuous data - **VP**: Common in image generation, well-studied - **EDM**: State-of-the-art, excellent sample quality - **iDDPM**: Improved training stability and sample quality