Skip to content

mamba

mamba

Mamba-2 language model.

A pure selective state space model following the Mamba-2 architecture (Dao & Gu, 2024). No positional embeddings — position information is implicit in the SSM recurrent state.

Inherits GPT's loss and unembed. Overrides __call__ to drop the block-size assertion (SSMs have no fixed context-length limit), and overrides setup, embed, and decode for SSM-specific structure.

Mamba

Bases: GPT

Mamba-2 language model — SSM-only, no attention.

Overrides GPT's setup/embed/decode to use MambaBlock layers and skip positional embeddings. loss and unembed are inherited unchanged.

embed(idx: jax.Array, deterministic: bool = False, **kwargs: Any) -> Any

Token embeddings only — no positional encoding needed for SSMs.