Skip to content

hybrid

hybrid

Hybrid Transformer + Mamba model.

Interleaves standard transformer (attention) blocks with Mamba-2 SSM blocks, following the Jamba architecture pattern (Lieber et al., 2024).

Inherits GPT's loss, unembed, and __call__. Only setup, components, sharding, and _parse_mamba_layers are new; embed and decode are minor overrides.

Hybrid

Bases: GPT

Hybrid Transformer + Mamba language model.

mamba_layers controls which layers use Mamba blocks: - "even": even-indexed layers (0, 2, 4, ...) are Mamba - "odd": odd-indexed layers (1, 3, 5, ...) are Mamba - comma-separated indices: e.g. "0,2,4,6" for explicit control