Testing Philosophy (Read This First)
Each module in IPSL-AID comes with its own dedicated tests.
Before running large-scale training, users are strongly encouraged to:
Explore the test suite
Understand data handling and normalization
Validate diffusion and architecture choices
Tune hyperparameters using small debug runs
This approach minimizes wasted compute time and ensures correct scientific usage.
Why Testing is Essential
Climate downscaling models involve:
Complex neural architectures
Sophisticated diffusion formulations
Large-scale data processing
Multi-GPU parallelization
Numerical precision requirements
Without proper testing, errors can:
Waste computational resources (days of GPU time)
Produce scientifically invalid results
Mask underlying implementation bugs
Compromise reproducibility
Recommended Workflow
Unit Tests: Run tests for individual components
python tests/test_all.pyIntegration Tests: Test data loading and model initialization
python tests/test_integration.pySmall-scale Debug Runs: Train on a small subset
# Use debug mode in setup script debug=true num_epochs=1 batch_size=4
Validation: Check against known baselines
Test Coverage
IPSL-AID includes tests for:
Data loading and preprocessing
Diffusion model implementations (VE, VP, EDM, iDDPM)
UNet architectures and variants
Loss functions and training loops
Inference and sampling procedures
Evaluation metrics and diagnostics
Debugging Tips
Use torch.autograd.set_detect_anomaly(True) during development
Monitor GPU memory usage with nvidia-smi
Check numerical stability with gradient norms
Validate data normalization and scaling
Remember: Test first, scale later. A few hours of testing can save days of wasted computation.