mamba
mamba
¶
Mamba-2 selective state space block.
Implements the Mamba-2 architecture (Dao & Gu, 2024) via its State Space
Duality (SSD) chunked-matmul algorithm: the recurrence is split along T
into chunks of size Q, with attention-style dense matmuls inside each
chunk and a much shorter associative scan over chunk-boundary states. This
keeps activation memory at O(B·T·H·N/Q + B·n_chunks·H·N·P) and turns
the inner loop into tensor-core-friendly GEMMs.