Quickstart
==========

Basic Workflow
--------------

1. **Setup environment** (see :doc:`installation`)
2. **Configure your experiment** via setup script
3. **Test components** (see :doc:`testing_philosophy`)
4. **Run training or inference**

Running the Model
-----------------

Model execution is controlled via the **ipsl-aid** command-line interface (CLI) or
a **setup bash script** that generates SLURM submission scripts.

**Option 1: Direct CLI Usage**

The simplest way to run the model is using the ``ipsl-aid`` command:

.. code-block:: bash

   # Show version and help
   ipsl-aid --version
   ipsl-aid --help

   # Train a model
   ipsl-aid --mode train \
       --arch adm \
       --precond edm \
       --in_channels 1 \
       --out_channels 1 \
       --batch_size 32 \
       --num_epochs 100 \
       --data_dir ./data \
       --output_dir ./runs

   # Run inference
   ipsl-aid --mode inference \
       --checkpoint ./checkpoints/model.pth \
       --data_dir ./data/test \
       --inference_type sampler

**Option 2: SLURM Batch Submission (Recommended for HPC)**

For HPC environments, use the setup script that:

- Defines all model, data, and training parameters
- Generates a SLURM submission script
- Encodes the full configuration in the output folder structure

Typical workflow:

1. Edit the setup script:
   - Select diffusion model (VE / VP / EDM / iDDPM)
   - Select architecture (e.g. ADM UNet)
   - Define variables, normalization, batch sizes
   - Choose training or inference mode

2. Generate the SBATCH script:

.. code-block:: bash

   ./setup

3. Submit the job:

.. code-block:: bash

   sbatch slurm/sbatch_diffusion_*.sh

This approach ensures **full reproducibility**, as every run is uniquely tagged
by its configuration.

Example Configuration
---------------------

Below is a comprehensive reference of all command line arguments accepted by the
IPSL-AID diffusion model. These arguments can be set in the setup bash script
and are passed to the Python training script.

.. list-table:: Command Line Arguments
   :header-rows: 1
   :widths: 25 15 50
   :stub-columns: 1

   * - Argument
     - Type
     - Description
   * - **Execution Mode**
     - -
     -
   * - ``--debug``
     - bool
     - Enable debug mode for reduced logging and testing (default: False)
   * - ``--run_type``
     - str
     - Run mode: ``train``, ``resume_train``, ``inference``, or ``inference_regional`` (default: train)
   * - ``--region``
     - str
     - Geographic region: ``us``, ``europe``, or ``asia`` (for regional inference)
   * - ``--inference_type``
     - str
     - Inference mode: ``direct`` (deterministic) or ``sampler`` (stochastic) (default: direct)
   * - **Data Configuration**
     - -
     -
   * - ``--datadir``
     - str
     - Main dataset directory path (**required**)
   * - ``--per_var_datadir``
     - list
     - Per-variable data directories as ``VAR=path`` pairs
   * - ``--varnames_list``
     - list
     - Variable names to train on (default: VAR_2T VAR_10U VAR_10V)
   * - ``--constant_varnames_list``
     - list
     - Constant variable names (static fields) (default: z lsm)
   * - ``--constant_varnames_file``
     - str
     - NetCDF file with constant variables (default: ERA5_const_sfc_variables.nc)
   * - ``--normalization_types``
     - list
     - Normalization per variable as ``var=type`` pairs (e.g., VAR_2T=standard)
   * - ``--units_list``
     - list
     - Units for each variable (default: K m/s m/s)
   * - ``--dynamic_covariates``
     - list
     - List of dynamic covariate names
   * - ``--dynamic_covariates_dir``
     - str
     - Directory for dynamic covariates (default: ../data_covariates/)
   * - **Time Range**
     - -
     -
   * - ``--year_start``
     - int
     - Start year for training dataset (default: 1980)
   * - ``--year_end``
     - int
     - End year for training dataset (default: 2020)
   * - ``--year_start_test``
     - int
     - Start year for test dataset (default: 2020)
   * - ``--year_end_test``
     - int
     - End year for test dataset (default: 2022)
   * - ``--time_normalization``
     - str
     - Time normalization type (e.g., linear, cos_sin) (default: linear)
   * - **Training Configuration**
     - -
     -
   * - ``--num_epochs``
     - int
     - Number of training epochs (default: 100)
   * - ``--batch_size``
     - int
     - Batch size for training (default: 8)
   * - ``--learning_rate``
     - float
     - Learning rate (default: 1e-4)
   * - ``--num_workers``
     - int
     - Number of DataLoader workers (default: 16)
   * - ``--tbatch``
     - int
     - Temporal batch length for processing (default: 1)
   * - ``--sbatch``
     - int
     - Number of spatial batches per timestamp (default: 8)
   * - ``--train_temporal_batch_mode``
     - str
     - Mode: ``full`` (whole sequence) or ``partial`` (batched) (default: partial)
   * - ``--tbatch_train``
     - int
     - Temporal batch length when mode=partial (default: 1)
   * - ``--test_temporal_batch_mode``
     - str
     - Test mode: ``full`` or ``partial`` (default: full)
   * - ``--tbatch_test``
     - int
     - Test temporal batch length (overrides if set)
   * - ``--test_spatial_batch_mode``
     - str
     - Test spatial mode: ``full`` or ``partial`` (default: full)
   * - ``--sbatch_test``
     - int
     - Test spatial batches (overrides if set)
   * - ``--batch_size_lat``
     - int
     - Latitude grid points per spatial batch (must be odd) (default: 145)
   * - ``--batch_size_lon``
     - int
     - Longitude grid points per spatial batch (must be odd) (default: 145)
   * - **Data Processing**
     - -
     -
   * - ``--epsilon``
     - float
     - Epsilon parameter for filtering (default: 0.02)
   * - ``--beta``
     - float
     - Beta parameter for loss function (default: 1.0)
   * - ``--margin``
     - int
     - Margin parameter for filtering (default: 8)
   * - **Output Configuration**
     - -
     -
   * - ``--main_folder``
     - str
     - Main output folder name (default: experiment)
   * - ``--sub_folder``
     - str
     - Sub-folder name for current run (default: experiment)
   * - ``--prefix``
     - str
     - Prefix for saved files (default: run)
   * - ``--dtype``
     - str
     - Precision: ``fp16``, ``fp32``, or ``fp64`` (default: fp32)
   * - **Model Architecture**
     - -
     -
   * - ``--arch``
     - str
     - Architecture: ``ddpmpp``, ``ncsnpp``, or ``adm`` (default: adm)
   * - ``--precond``
     - str
     - Preconditioner: ``vp``, ``ve``, ``edm``, or ``unet`` (default: edm)
   * - ``--in_channels``
     - int
     - Number of input variable channels (default: 3)
   * - ``--cond_channels``
     - int
     - Number of conditioning channels (default: 0)
   * - ``--out_channels``
     - int
     - Number of output channels (default: 3)
   * - **Checkpoint Configuration**
     - -
     -
   * - ``--save_model``
     - bool
     - Enable model checkpoint saving (default: False)
   * - ``--apply_filter``
     - bool
     - Apply fine filtering for coarse data generation (default: False)
   * - ``--save_checkpoint_name``
     - str
     - Name for saved checkpoints (default: diffusion_model_checkpoint)
   * - ``--save_per_samples``
     - int
     - Save checkpoint every N samples (default: 10000)
   * - ``--load_checkpoint_name``
     - str
     - Checkpoint file to load for resume/inference (default: model.pth.tar)
   * - **Regional Inference**
     - -
     -
   * - ``--region_center``
     - float list (2)
     - [latitude, longitude] center for regional inference
   * - ``--region_size``
     - int list (2)
     - [lat_size, lon_size] in grid points for regional inference
   * - **EDM Sampler Configuration**
     - -
     - (for stochastic inference with ``inference_type=sampler``)
   * - ``--num_steps``
     - int
     - Number of sampling steps (default: 20)
   * - ``--sigma_min``
     - float
     - Minimum noise level (default: 0.002)
   * - ``--sigma_max``
     - float
     - Maximum noise level (default: 80.0)
   * - ``--rho``
     - float
     - Exponent for time step discretization (default: 7.0)
   * - ``--s_churn``
     - float
     - Stochasticity strength (default: 40)
   * - ``--s_min``
     - float
     - Minimum noise for stochasticity (default: 0)
   * - ``--s_max``
     - float
     - Maximum noise for stochasticity (default: inf)
   * - ``--s_noise``
     - float
     - Noise scale when stochasticity enabled (default: 1.0)
   * - ``--solver``
     - str
     - ODE solver: ``heun`` or ``euler`` (default: heun)
   * - **CRPS Evaluation**
     - -
     -
   * - ``--compute_crps``
     - bool
     - Compute Continuous Ranked Probability Score (default: False)
   * - ``--crps_ensemble_size``
     - int
     - Ensemble size for CRPS calculation (default: 10)
   * - ``--crps_batch_size``
     - int
     - Batch size for CRPS computation (default: 2)

Here is an example setup script snippet with commonly used parameters:

.. code-block:: bash

   debug=true
   run_type="train"
   region=""
   save_model=true
   save_checkpoint_name="difusion_model"
   load_checkpoint_name="difusion_model"
   save_per_samples=10000

   year_start=2019
   year_end=2019
   year_start_test=2020
   year_end_test=2020

   batch_size=70
   num_epochs=1
   learning_rate=0.0001
   num_workers=8

   datadir="/leonardo_work/EUHPC_D27_095/kkingston/AI-Downscaling/data/data_FOURxDaily"
   per_var_datadir=(
     "VAR_2T=/leonardo_work/EUHPC_D27_095/kkingston/AI-Downscaling/data/data_FOURxDaily"
     )

   time_normalization="cos_sin"

   varnames_list=("VAR_2T")
   constant_varnames_list=("z" "lsm")
   constant_varnames_file="ERA5_const_sfc_variables.nc"
   normalization_types=("VAR_2T=standard")
   units_list=("K")
   dynamic_covariates=()
   dynamic_covariates_dir="../data_covariates/"

   sbatch=12
   tbatch=1800
   batch_size_lat=145
   batch_size_lon=361

   epsilon=0.02
   beta=1.0
   margin=8

   pretrained_path=""
   model_name=""

   dtype="fp32"
   arch="adm"
   precond="edm"
   in_channels=1
   cond_channels=5
   out_channels=1
   inference_type="sampler"

   compute_crps=false

   num_steps=10
   sigma_min=0.002
   sigma_max=80.0
   rho=7
   s_churn=40
   solver="heun"

   apply_filter=false

Data Preparation
----------------

IPSL-AID is designed to work with ERA5 reanalysis data:

1. Download ERA5 data (0.25° resolution)
2. Preprocess using the provided scripts
3. Set up train/validation/test splits (typically 2015-2019 train, 2020 validation, 2021 test)

Evaluation Metrics
------------------

The model is evaluated using:

- **Mean Absolute Error (MAE)**: Pointwise accuracy
- **Root Mean Square Error (RMSE)**: Overall deviation
- **Coefficient of Determination (R²)**: Variance explained
- **Continuous Ranked Probability Score (CRPS)**: Probabilistic performance
- **Power Spectral Density (PSD)**: Spatial scale fidelity
- **Probability Density Functions (PDFs)**: Distribution matching

Next Steps
----------

- Read the :doc:`testing_philosophy` before running large experiments
- Explore :doc:`diffusion_models` to understand model choices
- Check the :doc:`api/modules` for detailed module documentation