Diffusion Models
================

Diffusion models are a class of generative models that progressively transform
simple noise into complex data distributions, such as images or climate fields.
Intuitively, they work in two phases:

1. **Forward diffusion**: A clean signal is gradually corrupted by adding Gaussian noise,
   eventually transforming it into a nearly pure Gaussian distribution.
2. **Reverse denoising**: A neural network is trained to gradually remove the noise,
   step by step, reconstructing the original data distribution from the noisy signal.

This can be visualized as follows:

.. figure:: ../../images/diffusion.png
   :width: 80%
   :align: center

   Two original Gaussian distributions are progressively transformed into normal distribution.
   A denoising network then reconstructs the original distributions.

The framework implements several diffusion formulations commonly used in
state-of-the-art generative modeling:

- **VE (Variance Exploding)**
- **VP (Variance Preserving)**
- **EDM (Elucidated Diffusion Models)**
- **iDDPM (Improved DDPM)**

These formulations are selectable via configuration and can be paired with
different neural architectures.

EDM Preconditioning
-------------------

The EDM preconditioned model stabilizes training by standardizing the scales of
inputs, outputs, and targets across varying noise levels:

.. math::

   D_\theta(\mathbf{x}; \sigma) =  c_{\rm {skip}}(\sigma) \mathbf{x} + c_{\rm{out}}(\sigma) F_\theta\big(c_{\rm{in}}(\sigma) \mathbf{x}; c_\mathrm{noise}(\sigma)\big)

Where:
- :math:`\mathbf{x}=\mathbf{y}+\sigma\mathbf{n}` is the noisy input
- :math:`\mathbf{y}` is the clean signal
- :math:`\mathbf{n} \sim \mathcal{N}(\mathbf{0}, \mathbf{1})` is standard Gaussian noise
- :math:`\sigma` is the noise level
- :math:`F_\theta` is the underlying neural network

Coefficients:

.. math::
   c_\mathrm{in}(\sigma) = 1 / (\sigma_\mathrm{data}^2 + \sigma^2)^{1/2}

.. math::
   c_\mathrm{skip}(\sigma) = \sigma_\mathrm{data}^2 / (\sigma_\mathrm{data}^2 + \sigma^2)

.. math::
   c_\mathrm{out}(\sigma) = \sigma \, \sigma_\mathrm{data} / (\sigma_\mathrm{data}^2 + \sigma^2)^{1/2}

.. math::
   c_\mathrm{noise}(\sigma) =1/4 \log \sigma

Loss Function
-------------

For each training sample, Gaussian noise :math:`\sigma \mathbf{n}` with a randomly
selected noise level :math:`\sigma` is added to the image. The network is trained
with weighted denoising loss:

.. math::

   \mathcal{L} = \mathbb{E}_{\sigma, \mathbf{y}, \mathbf{n}} \left[ \lambda(\sigma) \left\| D_{\theta}(\mathbf{y} + \sigma\mathbf{n}, \sigma) - \mathbf{y} \right\|_2^2 \right]

Where :math:`\lambda(\sigma) = (\sigma^2 + \sigma_{\mathrm{data}}^2) / (\sigma \, \sigma_{\mathrm{data}})^2`.

Sampling
--------

High-resolution samples are generated by numerically solving the reverse-time
stochastic differential equation (SDE):

1. Initialize with Gaussian noise :math:`\mathbf{x}_0 \sim \mathcal{N}(\mathbf{0}, t_0^2 \mathbf{1})`
2. For each step :math:`i` from 0 to :math:`N-1`:
   - Optionally add temporary noise increment
   - Compute denoising direction
   - Update latent with Euler/Heun scheme
3. Return final denoised sample

Theoretical Background
----------------------

For theoretical background, see:

- `DDPM <https://arxiv.org/abs/2006.11239>`_
- `Score-Based Generative Models <https://arxiv.org/abs/2011.13456>`_
- `EDM <https://arxiv.org/abs/2206.00364>`_

Implementation Details
----------------------

Each diffusion formulation is implemented as a separate class with:

- **Noise scheduling**: Defines :math:`\sigma(t)` progression
- **Sampling methods**: Different ODE/SDE solvers
- **Loss computation**: Formulation-specific weighting
- **Conditioning**: Support for various conditioning strategies

Configuration Example
---------------------

.. code-block:: yaml

   diffusion:
     type: "EDM"
     sigma_data: 1.0
     sigma_min: 0.002
     sigma_max: 80.0
     rho: 7.0
     p_mean: -1.2
     p_std: 1.2

   sampling:
     steps: 40
     sampler: "heun"
     s_churn: 40.0
     s_min: 0.05
     s_max: 50.0
     s_noise: 1.003

Comparison of Formulations
--------------------------

- **VE**: Simple, stable, good for continuous data
- **VP**: Common in image generation, well-studied
- **EDM**: State-of-the-art, excellent sample quality
- **iDDPM**: Improved training stability and sample quality