Skip to content

rmsnorm

rmsnorm

RMSNorm

Bases: Module

Root-mean-square layer norm.

centered=True switches to the Qwen 3.5 / OLMo-2 convention: multiplier is 1 + weight with weight init at 0 (numerically still centered at 1, but lets HF checkpoints round-trip).

RMSNormGated

Bases: Module

RMSNorm with a multiplicative silu(gate) after the weight.

Used inside the gated-delta-net token mixer (Qwen 3.5 linear-attention layers). Weight is initialized to ones — matches HF Qwen3_5RMSNormGated.