qwen_3_5_moe
qwen_3_5_moe
¶
Text-only Qwen 3.5 MoE (35B-A3B).
Subclasses :class:Qwen3_5 — same hybrid attention layout, but each layer's
MLP is a routed top-k MoE with a sigmoid-gated shared expert.
HF stores experts as fused 3D tensors gate_up_proj / down_proj. The
loader splits gate_up_proj along the intermediate axis into our separate
gate / up vmapped MLPs.