Quickstart

Basic Workflow

Setup environment (see Installation)
Configure your experiment via setup script
Test components (see Testing Philosophy (Read This First))
Run training or inference

Running the Model

Model execution is controlled via the ipsl-aid command-line interface (CLI) or a setup bash script that generates SLURM submission scripts.

Option 1: Direct CLI Usage

The simplest way to run the model is using the ipsl-aid command:

# Show version and help
ipsl-aid --version
ipsl-aid --help

# Train a model
ipsl-aid --mode train \
    --arch adm \
    --precond edm \
    --in_channels 1 \
    --out_channels 1 \
    --batch_size 32 \
    --num_epochs 100 \
    --data_dir ./data \
    --output_dir ./runs

# Run inference
ipsl-aid --mode inference \
    --checkpoint ./checkpoints/model.pth \
    --data_dir ./data/test \
    --inference_type sampler

Option 2: SLURM Batch Submission (Recommended for HPC)

For HPC environments, use the setup script that:

Defines all model, data, and training parameters
Generates a SLURM submission script
Encodes the full configuration in the output folder structure

Typical workflow:

Edit the setup script: - Select diffusion model (VE / VP / EDM / iDDPM) - Select architecture (e.g. ADM UNet) - Define variables, normalization, batch sizes - Choose training or inference mode
Generate the SBATCH script:

./setup

Submit the job:

sbatch slurm/sbatch_diffusion_*.sh

This approach ensures full reproducibility, as every run is uniquely tagged by its configuration.

Example Configuration

Below is a comprehensive reference of all command line arguments accepted by the IPSL-AID diffusion model. These arguments can be set in the setup bash script and are passed to the Python training script.

Command Line Arguments
Argument	Type	Description
Execution Mode
`--debug`	bool	Enable debug mode for reduced logging and testing (default: False)
`--run_type`	str	Run mode: `train`, `resume_train`, `inference`, or `inference_regional` (default: train)
`--region`	str	Geographic region: `us`, `europe`, or `asia` (for regional inference)
`--inference_type`	str	Inference mode: `direct` (deterministic) or `sampler` (stochastic) (default: direct)
Data Configuration
`--datadir`	str	Main dataset directory path (required)
`--per_var_datadir`	list	Per-variable data directories as `VAR=path` pairs
`--varnames_list`	list	Variable names to train on (default: VAR_2T VAR_10U VAR_10V)
`--constant_varnames_list`	list	Constant variable names (static fields) (default: z lsm)
`--constant_varnames_file`	str	NetCDF file with constant variables (default: ERA5_const_sfc_variables.nc)
`--normalization_types`	list	Normalization per variable as `var=type` pairs (e.g., VAR_2T=standard)
`--units_list`	list	Units for each variable (default: K m/s m/s)
`--dynamic_covariates`	list	List of dynamic covariate names
`--dynamic_covariates_dir`	str	Directory for dynamic covariates (default: ../data_covariates/)
Time Range
`--year_start`	int	Start year for training dataset (default: 1980)
`--year_end`	int	End year for training dataset (default: 2020)
`--year_start_test`	int	Start year for test dataset (default: 2020)
`--year_end_test`	int	End year for test dataset (default: 2022)
`--time_normalization`	str	Time normalization type (e.g., linear, cos_sin) (default: linear)
Training Configuration
`--num_epochs`	int	Number of training epochs (default: 100)
`--batch_size`	int	Batch size for training (default: 8)
`--learning_rate`	float	Learning rate (default: 1e-4)
`--num_workers`	int	Number of DataLoader workers (default: 16)
`--tbatch`	int	Temporal batch length for processing (default: 1)
`--sbatch`	int	Number of spatial batches per timestamp (default: 8)
`--train_temporal_batch_mode`	str	Mode: `full` (whole sequence) or `partial` (batched) (default: partial)
`--tbatch_train`	int	Temporal batch length when mode=partial (default: 1)
`--test_temporal_batch_mode`	str	Test mode: `full` or `partial` (default: full)
`--tbatch_test`	int	Test temporal batch length (overrides if set)
`--test_spatial_batch_mode`	str	Test spatial mode: `full` or `partial` (default: full)
`--sbatch_test`	int	Test spatial batches (overrides if set)
`--batch_size_lat`	int	Latitude grid points per spatial batch (must be odd) (default: 145)
`--batch_size_lon`	int	Longitude grid points per spatial batch (must be odd) (default: 145)
Data Processing
`--epsilon`	float	Epsilon parameter for filtering (default: 0.02)
`--beta`	float	Beta parameter for loss function (default: 1.0)
`--margin`	int	Margin parameter for filtering (default: 8)
Output Configuration
`--main_folder`	str	Main output folder name (default: experiment)
`--sub_folder`	str	Sub-folder name for current run (default: experiment)
`--prefix`	str	Prefix for saved files (default: run)
`--dtype`	str	Precision: `fp16`, `fp32`, or `fp64` (default: fp32)
Model Architecture
`--arch`	str	Architecture: `ddpmpp`, `ncsnpp`, or `adm` (default: adm)
`--precond`	str	Preconditioner: `vp`, `ve`, `edm`, or `unet` (default: edm)
`--in_channels`	int	Number of input variable channels (default: 3)
`--cond_channels`	int	Number of conditioning channels (default: 0)
`--out_channels`	int	Number of output channels (default: 3)
Checkpoint Configuration
`--save_model`	bool	Enable model checkpoint saving (default: False)
`--apply_filter`	bool	Apply fine filtering for coarse data generation (default: False)
`--save_checkpoint_name`	str	Name for saved checkpoints (default: diffusion_model_checkpoint)
`--save_per_samples`	int	Save checkpoint every N samples (default: 10000)
`--load_checkpoint_name`	str	Checkpoint file to load for resume/inference (default: model.pth.tar)
Regional Inference
`--region_center`	float list (2)	[latitude, longitude] center for regional inference
`--region_size`	int list (2)	[lat_size, lon_size] in grid points for regional inference
EDM Sampler Configuration		(for stochastic inference with `inference_type=sampler`)
`--num_steps`	int	Number of sampling steps (default: 20)
`--sigma_min`	float	Minimum noise level (default: 0.002)
`--sigma_max`	float	Maximum noise level (default: 80.0)
`--rho`	float	Exponent for time step discretization (default: 7.0)
`--s_churn`	float	Stochasticity strength (default: 40)
`--s_min`	float	Minimum noise for stochasticity (default: 0)
`--s_max`	float	Maximum noise for stochasticity (default: inf)
`--s_noise`	float	Noise scale when stochasticity enabled (default: 1.0)
`--solver`	str	ODE solver: `heun` or `euler` (default: heun)
CRPS Evaluation
`--compute_crps`	bool	Compute Continuous Ranked Probability Score (default: False)
`--crps_ensemble_size`	int	Ensemble size for CRPS calculation (default: 10)
`--crps_batch_size`	int	Batch size for CRPS computation (default: 2)

Here is an example setup script snippet with commonly used parameters:

debug=true
run_type="train"
region=""
save_model=true
save_checkpoint_name="difusion_model"
load_checkpoint_name="difusion_model"
save_per_samples=10000

year_start=2019
year_end=2019
year_start_test=2020
year_end_test=2020

batch_size=70
num_epochs=1
learning_rate=0.0001
num_workers=8

datadir="/leonardo_work/EUHPC_D27_095/kkingston/AI-Downscaling/data/data_FOURxDaily"
per_var_datadir=(
  "VAR_2T=/leonardo_work/EUHPC_D27_095/kkingston/AI-Downscaling/data/data_FOURxDaily"
  )

time_normalization="cos_sin"

varnames_list=("VAR_2T")
constant_varnames_list=("z" "lsm")
constant_varnames_file="ERA5_const_sfc_variables.nc"
normalization_types=("VAR_2T=standard")
units_list=("K")
dynamic_covariates=()
dynamic_covariates_dir="../data_covariates/"

sbatch=12
tbatch=1800
batch_size_lat=145
batch_size_lon=361

epsilon=0.02
beta=1.0
margin=8

pretrained_path=""
model_name=""

dtype="fp32"
arch="adm"
precond="edm"
in_channels=1
cond_channels=5
out_channels=1
inference_type="sampler"

compute_crps=false

num_steps=10
sigma_min=0.002
sigma_max=80.0
rho=7
s_churn=40
solver="heun"

apply_filter=false

Data Preparation

IPSL-AID is designed to work with ERA5 reanalysis data:

Download ERA5 data (0.25° resolution)
Preprocess using the provided scripts
Set up train/validation/test splits (typically 2015-2019 train, 2020 validation, 2021 test)

Evaluation Metrics

The model is evaluated using:

Mean Absolute Error (MAE): Pointwise accuracy
Root Mean Square Error (RMSE): Overall deviation
Coefficient of Determination (R²): Variance explained
Continuous Ranked Probability Score (CRPS): Probabilistic performance
Power Spectral Density (PSD): Spatial scale fidelity
Probability Density Functions (PDFs): Distribution matching

Next Steps

Read the Testing Philosophy (Read This First) before running large experiments
Explore Diffusion Models to understand model choices
Check the IPSL_AID for detailed module documentation