Quickstart
Basic Workflow
Setup environment (see Installation)
Configure your experiment via setup script
Test components (see Testing Philosophy (Read This First))
Run training or inference
Running the Model
Model execution is controlled via the ipsl-aid command-line interface (CLI) or a setup bash script that generates SLURM submission scripts.
Option 1: Direct CLI Usage
The simplest way to run the model is using the ipsl-aid command:
# Show version and help
ipsl-aid --version
ipsl-aid --help
# Train a model
ipsl-aid --mode train \
--arch adm \
--precond edm \
--in_channels 1 \
--out_channels 1 \
--batch_size 32 \
--num_epochs 100 \
--data_dir ./data \
--output_dir ./runs
# Run inference
ipsl-aid --mode inference \
--checkpoint ./checkpoints/model.pth \
--data_dir ./data/test \
--inference_type sampler
Option 2: SLURM Batch Submission (Recommended for HPC)
For HPC environments, use the setup script that:
Defines all model, data, and training parameters
Generates a SLURM submission script
Encodes the full configuration in the output folder structure
Typical workflow:
Edit the setup script: - Select diffusion model (VE / VP / EDM / iDDPM) - Select architecture (e.g. ADM UNet) - Define variables, normalization, batch sizes - Choose training or inference mode
Generate the SBATCH script:
./setup
Submit the job:
sbatch slurm/sbatch_diffusion_*.sh
This approach ensures full reproducibility, as every run is uniquely tagged by its configuration.
Example Configuration
Below is a comprehensive reference of all command line arguments accepted by the IPSL-AID diffusion model. These arguments can be set in the setup bash script and are passed to the Python training script.
Argument |
Type |
Description |
|---|---|---|
Execution Mode |
||
|
bool |
Enable debug mode for reduced logging and testing (default: False) |
|
str |
Run mode: |
|
str |
Geographic region: |
|
str |
Inference mode: |
Data Configuration |
||
|
str |
Main dataset directory path (required) |
|
list |
Per-variable data directories as |
|
list |
Variable names to train on (default: VAR_2T VAR_10U VAR_10V) |
|
list |
Constant variable names (static fields) (default: z lsm) |
|
str |
NetCDF file with constant variables (default: ERA5_const_sfc_variables.nc) |
|
list |
Normalization per variable as |
|
list |
Units for each variable (default: K m/s m/s) |
|
list |
List of dynamic covariate names |
|
str |
Directory for dynamic covariates (default: ../data_covariates/) |
Time Range |
||
|
int |
Start year for training dataset (default: 1980) |
|
int |
End year for training dataset (default: 2020) |
|
int |
Start year for test dataset (default: 2020) |
|
int |
End year for test dataset (default: 2022) |
|
str |
Time normalization type (e.g., linear, cos_sin) (default: linear) |
Training Configuration |
||
|
int |
Number of training epochs (default: 100) |
|
int |
Batch size for training (default: 8) |
|
float |
Learning rate (default: 1e-4) |
|
int |
Number of DataLoader workers (default: 16) |
|
int |
Temporal batch length for processing (default: 1) |
|
int |
Number of spatial batches per timestamp (default: 8) |
|
str |
Mode: |
|
int |
Temporal batch length when mode=partial (default: 1) |
|
str |
Test mode: |
|
int |
Test temporal batch length (overrides if set) |
|
str |
Test spatial mode: |
|
int |
Test spatial batches (overrides if set) |
|
int |
Latitude grid points per spatial batch (must be odd) (default: 145) |
|
int |
Longitude grid points per spatial batch (must be odd) (default: 145) |
Data Processing |
||
|
float |
Epsilon parameter for filtering (default: 0.02) |
|
float |
Beta parameter for loss function (default: 1.0) |
|
int |
Margin parameter for filtering (default: 8) |
Output Configuration |
||
|
str |
Main output folder name (default: experiment) |
|
str |
Sub-folder name for current run (default: experiment) |
|
str |
Prefix for saved files (default: run) |
|
str |
Precision: |
Model Architecture |
||
|
str |
Architecture: |
|
str |
Preconditioner: |
|
int |
Number of input variable channels (default: 3) |
|
int |
Number of conditioning channels (default: 0) |
|
int |
Number of output channels (default: 3) |
Checkpoint Configuration |
||
|
bool |
Enable model checkpoint saving (default: False) |
|
bool |
Apply fine filtering for coarse data generation (default: False) |
|
str |
Name for saved checkpoints (default: diffusion_model_checkpoint) |
|
int |
Save checkpoint every N samples (default: 10000) |
|
str |
Checkpoint file to load for resume/inference (default: model.pth.tar) |
Regional Inference |
||
|
float list (2) |
[latitude, longitude] center for regional inference |
|
int list (2) |
[lat_size, lon_size] in grid points for regional inference |
EDM Sampler Configuration |
(for stochastic inference with |
|
|
int |
Number of sampling steps (default: 20) |
|
float |
Minimum noise level (default: 0.002) |
|
float |
Maximum noise level (default: 80.0) |
|
float |
Exponent for time step discretization (default: 7.0) |
|
float |
Stochasticity strength (default: 40) |
|
float |
Minimum noise for stochasticity (default: 0) |
|
float |
Maximum noise for stochasticity (default: inf) |
|
float |
Noise scale when stochasticity enabled (default: 1.0) |
|
str |
ODE solver: |
CRPS Evaluation |
||
|
bool |
Compute Continuous Ranked Probability Score (default: False) |
|
int |
Ensemble size for CRPS calculation (default: 10) |
|
int |
Batch size for CRPS computation (default: 2) |
Here is an example setup script snippet with commonly used parameters:
debug=true
run_type="train"
region=""
save_model=true
save_checkpoint_name="difusion_model"
load_checkpoint_name="difusion_model"
save_per_samples=10000
year_start=2019
year_end=2019
year_start_test=2020
year_end_test=2020
batch_size=70
num_epochs=1
learning_rate=0.0001
num_workers=8
datadir="/leonardo_work/EUHPC_D27_095/kkingston/AI-Downscaling/data/data_FOURxDaily"
per_var_datadir=(
"VAR_2T=/leonardo_work/EUHPC_D27_095/kkingston/AI-Downscaling/data/data_FOURxDaily"
)
time_normalization="cos_sin"
varnames_list=("VAR_2T")
constant_varnames_list=("z" "lsm")
constant_varnames_file="ERA5_const_sfc_variables.nc"
normalization_types=("VAR_2T=standard")
units_list=("K")
dynamic_covariates=()
dynamic_covariates_dir="../data_covariates/"
sbatch=12
tbatch=1800
batch_size_lat=145
batch_size_lon=361
epsilon=0.02
beta=1.0
margin=8
pretrained_path=""
model_name=""
dtype="fp32"
arch="adm"
precond="edm"
in_channels=1
cond_channels=5
out_channels=1
inference_type="sampler"
compute_crps=false
num_steps=10
sigma_min=0.002
sigma_max=80.0
rho=7
s_churn=40
solver="heun"
apply_filter=false
Data Preparation
IPSL-AID is designed to work with ERA5 reanalysis data:
Download ERA5 data (0.25° resolution)
Preprocess using the provided scripts
Set up train/validation/test splits (typically 2015-2019 train, 2020 validation, 2021 test)
Evaluation Metrics
The model is evaluated using:
Mean Absolute Error (MAE): Pointwise accuracy
Root Mean Square Error (RMSE): Overall deviation
Coefficient of Determination (R²): Variance explained
Continuous Ranked Probability Score (CRPS): Probabilistic performance
Power Spectral Density (PSD): Spatial scale fidelity
Probability Density Functions (PDFs): Distribution matching
Next Steps
Read the Testing Philosophy (Read This First) before running large experiments
Explore Diffusion Models to understand model choices
Check the IPSL_AID for detailed module documentation