Quickstart

Basic Workflow

  1. Setup environment (see Installation)

  2. Configure your experiment via setup script

  3. Test components (see Testing Philosophy (Read This First))

  4. Run training or inference

Running the Model

Model execution is controlled via the ipsl-aid command-line interface (CLI) or a setup bash script that generates SLURM submission scripts.

Option 1: Direct CLI Usage

The simplest way to run the model is using the ipsl-aid command:

# Show version and help
ipsl-aid --version
ipsl-aid --help

# Train a model
ipsl-aid --mode train \
    --arch adm \
    --precond edm \
    --in_channels 1 \
    --out_channels 1 \
    --batch_size 32 \
    --num_epochs 100 \
    --data_dir ./data \
    --output_dir ./runs

# Run inference
ipsl-aid --mode inference \
    --checkpoint ./checkpoints/model.pth \
    --data_dir ./data/test \
    --inference_type sampler

Option 2: SLURM Batch Submission (Recommended for HPC)

For HPC environments, use the setup script that:

  • Defines all model, data, and training parameters

  • Generates a SLURM submission script

  • Encodes the full configuration in the output folder structure

Typical workflow:

  1. Edit the setup script: - Select diffusion model (VE / VP / EDM / iDDPM) - Select architecture (e.g. ADM UNet) - Define variables, normalization, batch sizes - Choose training or inference mode

  2. Generate the SBATCH script:

./setup
  1. Submit the job:

sbatch slurm/sbatch_diffusion_*.sh

This approach ensures full reproducibility, as every run is uniquely tagged by its configuration.

Example Configuration

Below is a comprehensive reference of all command line arguments accepted by the IPSL-AID diffusion model. These arguments can be set in the setup bash script and are passed to the Python training script.

Command Line Arguments

Argument

Type

Description

Execution Mode

--debug

bool

Enable debug mode for reduced logging and testing (default: False)

--run_type

str

Run mode: train, resume_train, inference, or inference_regional (default: train)

--region

str

Geographic region: us, europe, or asia (for regional inference)

--inference_type

str

Inference mode: direct (deterministic) or sampler (stochastic) (default: direct)

Data Configuration

--datadir

str

Main dataset directory path (required)

--per_var_datadir

list

Per-variable data directories as VAR=path pairs

--varnames_list

list

Variable names to train on (default: VAR_2T VAR_10U VAR_10V)

--constant_varnames_list

list

Constant variable names (static fields) (default: z lsm)

--constant_varnames_file

str

NetCDF file with constant variables (default: ERA5_const_sfc_variables.nc)

--normalization_types

list

Normalization per variable as var=type pairs (e.g., VAR_2T=standard)

--units_list

list

Units for each variable (default: K m/s m/s)

--dynamic_covariates

list

List of dynamic covariate names

--dynamic_covariates_dir

str

Directory for dynamic covariates (default: ../data_covariates/)

Time Range

--year_start

int

Start year for training dataset (default: 1980)

--year_end

int

End year for training dataset (default: 2020)

--year_start_test

int

Start year for test dataset (default: 2020)

--year_end_test

int

End year for test dataset (default: 2022)

--time_normalization

str

Time normalization type (e.g., linear, cos_sin) (default: linear)

Training Configuration

--num_epochs

int

Number of training epochs (default: 100)

--batch_size

int

Batch size for training (default: 8)

--learning_rate

float

Learning rate (default: 1e-4)

--num_workers

int

Number of DataLoader workers (default: 16)

--tbatch

int

Temporal batch length for processing (default: 1)

--sbatch

int

Number of spatial batches per timestamp (default: 8)

--train_temporal_batch_mode

str

Mode: full (whole sequence) or partial (batched) (default: partial)

--tbatch_train

int

Temporal batch length when mode=partial (default: 1)

--test_temporal_batch_mode

str

Test mode: full or partial (default: full)

--tbatch_test

int

Test temporal batch length (overrides if set)

--test_spatial_batch_mode

str

Test spatial mode: full or partial (default: full)

--sbatch_test

int

Test spatial batches (overrides if set)

--batch_size_lat

int

Latitude grid points per spatial batch (must be odd) (default: 145)

--batch_size_lon

int

Longitude grid points per spatial batch (must be odd) (default: 145)

Data Processing

--epsilon

float

Epsilon parameter for filtering (default: 0.02)

--beta

float

Beta parameter for loss function (default: 1.0)

--margin

int

Margin parameter for filtering (default: 8)

Output Configuration

--main_folder

str

Main output folder name (default: experiment)

--sub_folder

str

Sub-folder name for current run (default: experiment)

--prefix

str

Prefix for saved files (default: run)

--dtype

str

Precision: fp16, fp32, or fp64 (default: fp32)

Model Architecture

--arch

str

Architecture: ddpmpp, ncsnpp, or adm (default: adm)

--precond

str

Preconditioner: vp, ve, edm, or unet (default: edm)

--in_channels

int

Number of input variable channels (default: 3)

--cond_channels

int

Number of conditioning channels (default: 0)

--out_channels

int

Number of output channels (default: 3)

Checkpoint Configuration

--save_model

bool

Enable model checkpoint saving (default: False)

--apply_filter

bool

Apply fine filtering for coarse data generation (default: False)

--save_checkpoint_name

str

Name for saved checkpoints (default: diffusion_model_checkpoint)

--save_per_samples

int

Save checkpoint every N samples (default: 10000)

--load_checkpoint_name

str

Checkpoint file to load for resume/inference (default: model.pth.tar)

Regional Inference

--region_center

float list (2)

[latitude, longitude] center for regional inference

--region_size

int list (2)

[lat_size, lon_size] in grid points for regional inference

EDM Sampler Configuration

(for stochastic inference with inference_type=sampler)

--num_steps

int

Number of sampling steps (default: 20)

--sigma_min

float

Minimum noise level (default: 0.002)

--sigma_max

float

Maximum noise level (default: 80.0)

--rho

float

Exponent for time step discretization (default: 7.0)

--s_churn

float

Stochasticity strength (default: 40)

--s_min

float

Minimum noise for stochasticity (default: 0)

--s_max

float

Maximum noise for stochasticity (default: inf)

--s_noise

float

Noise scale when stochasticity enabled (default: 1.0)

--solver

str

ODE solver: heun or euler (default: heun)

CRPS Evaluation

--compute_crps

bool

Compute Continuous Ranked Probability Score (default: False)

--crps_ensemble_size

int

Ensemble size for CRPS calculation (default: 10)

--crps_batch_size

int

Batch size for CRPS computation (default: 2)

Here is an example setup script snippet with commonly used parameters:

debug=true
run_type="train"
region=""
save_model=true
save_checkpoint_name="difusion_model"
load_checkpoint_name="difusion_model"
save_per_samples=10000

year_start=2019
year_end=2019
year_start_test=2020
year_end_test=2020

batch_size=70
num_epochs=1
learning_rate=0.0001
num_workers=8

datadir="/leonardo_work/EUHPC_D27_095/kkingston/AI-Downscaling/data/data_FOURxDaily"
per_var_datadir=(
  "VAR_2T=/leonardo_work/EUHPC_D27_095/kkingston/AI-Downscaling/data/data_FOURxDaily"
  )

time_normalization="cos_sin"

varnames_list=("VAR_2T")
constant_varnames_list=("z" "lsm")
constant_varnames_file="ERA5_const_sfc_variables.nc"
normalization_types=("VAR_2T=standard")
units_list=("K")
dynamic_covariates=()
dynamic_covariates_dir="../data_covariates/"

sbatch=12
tbatch=1800
batch_size_lat=145
batch_size_lon=361

epsilon=0.02
beta=1.0
margin=8

pretrained_path=""
model_name=""

dtype="fp32"
arch="adm"
precond="edm"
in_channels=1
cond_channels=5
out_channels=1
inference_type="sampler"

compute_crps=false

num_steps=10
sigma_min=0.002
sigma_max=80.0
rho=7
s_churn=40
solver="heun"

apply_filter=false

Data Preparation

IPSL-AID is designed to work with ERA5 reanalysis data:

  1. Download ERA5 data (0.25° resolution)

  2. Preprocess using the provided scripts

  3. Set up train/validation/test splits (typically 2015-2019 train, 2020 validation, 2021 test)

Evaluation Metrics

The model is evaluated using:

  • Mean Absolute Error (MAE): Pointwise accuracy

  • Root Mean Square Error (RMSE): Overall deviation

  • Coefficient of Determination (R²): Variance explained

  • Continuous Ranked Probability Score (CRPS): Probabilistic performance

  • Power Spectral Density (PSD): Spatial scale fidelity

  • Probability Density Functions (PDFs): Distribution matching

Next Steps