base

`base` ¶

Base self-attention module with hook-based API and KV cache support.

All Q/K/V hooks operate on (B, T, H, D) format. Subclasses override _project_inner/preprocess_qkv/build_mask/attn/postprocess_attn/output_proj.

KV cache is activated by calling model.apply(..., mutable=['cache']). Without mutable=['cache'], the cache is a no-op (training mode).

Bases: Module

Project input to (q, k, v). Calls _project_inner.

Hook for RoPE, KV repeat, etc. Input/output: (B, T, H, D).

Construct attention mask. Returns bool mask or None.

Core attention. Input q/k/v: (B, T_q/T_kv, H, D). Output: (B, T_q, H, D).

Post-attention processing. Input/output: (B, T, H, D).

Output projection. (B, T, C) → (B, T, C).