qwen_3_5
qwen_3_5
¶
Qwen 3.5 hybrid decoder block.
Each layer's layer_type selects the token mixer:
* "full_attention": GroupedSelfAttention with Q-output gate and
per-head Q/K RMSNorm.
* "linear_attention": GatedDeltaNet.
The MLP is set by the model — subclasses override _make_mlp to swap in a
MoE for the 35B-A3B variant.