mamba
mamba
¶
Mamba-2 language model.
A pure selective state space model following the Mamba-2 architecture (Dao & Gu, 2024). No positional embeddings — position information is implicit in the SSM recurrent state.
Inherits GPT's loss and unembed. Overrides __call__ to drop
the block-size assertion (SSMs have no fixed context-length limit), and
overrides setup, embed, and decode for SSM-specific structure.
Mamba
¶
Bases: GPT
Mamba-2 language model — SSM-only, no attention.
Overrides GPT's setup/embed/decode to use MambaBlock layers
and skip positional embeddings. loss and unembed are
inherited unchanged.
embed(idx: jax.Array, deterministic: bool = False, **kwargs: Any) -> Any
¶
Token embeddings only — no positional encoding needed for SSMs.