Neural Architectures
====================

RTnn supports multiple neural network architectures for radiative transfer modeling.

Recurrent Neural Networks (RNN)
-------------------------------

LSTM (Long Short-Term Memory)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Bidirectional LSTM with final Conv1d projection.

.. code-block:: python

   from rtnn import RNN_LSTM

   model = RNN_LSTM(
       feature_channel=6,    # Input features
       output_channel=4,     # Output channels
       hidden_size=128,      # Hidden state size
       num_layers=3          # Number of LSTM layers
   )

**Architecture:**
- Bidirectional LSTM: captures forward and backward dependencies
- Conv1d output: projects hidden states to output channels

GRU (Gated Recurrent Unit)
~~~~~~~~~~~~~~~~~~~~~~~~~~

Similar to LSTM but with fewer parameters.

.. code-block:: python

   from rtnn import RNN_GRU

   model = RNN_GRU(
       feature_channel=6,
       output_channel=4,
       hidden_size=128,
       num_layers=3
   )

Transformer
-----------

Self-attention based encoder for sequence processing.

.. code-block:: python

   from rtnn import TransformerEncoder

   model = TransformerEncoder(
       feature_channel=6,
       output_channel=4,
       embed_size=64,        # Embedding dimension
       num_layers=2,         # Number of transformer blocks
       heads=4,              # Attention heads
       forward_expansion=4,  # Feed-forward expansion factor
       seq_length=10,        # Input sequence length
       dropout=0.1           # Dropout rate
   )

**Features:**
- Positional embeddings
- Multi-head self-attention
- Residual connections
- Layer normalization

FCN (Fully Connected Network)
-----------------------------

Deep fully connected network with batch normalization.

.. code-block:: python

   from rtnn import FCN

   model = FCN(
       feature_channel=6,
       output_channel=4,
       num_layers=3,         # Number of hidden layers
       hidden_size=196,      # Hidden layer size
       seq_length=10,        # Input sequence length
       dim_expand=0          # Optional sequence expansion
   )

**Architecture:**
- Flattens input: (batch, channels, seq) → (batch, channels * seq)
- FCBlock: Linear → BatchNorm → ReLU
- Optional sequence length expansion


Model Comparison
----------------

.. list-table::
   :widths: 20 25 25 30
   :header-rows: 1

   * - Architecture
     - Parameters
     - Best For
     - Pros/Cons
   * - LSTM/GRU
     - Moderate
     - Temporal dependencies
     - Good for sequences, can be slow
   * - Transformer
     - Large
     - Long-range dependencies
     - Parallel processing, memory intensive
   * - FCN
     - Moderate
     - Simple relationships
     - Fast, no temporal modeling

Choosing an Architecture
------------------------

1. **LSTM/GRU**: Default choice for temporal sequences
2. **Transformer**: For long sequences or when attention is important
3. **FCN**: For simple regression tasks without temporal structure

See :doc:`training_strategy` for hyperparameter recommendations.