Skip to content

qwen_3_5

qwen_3_5

Qwen 3.5 hybrid decoder block.

Each layer's layer_type selects the token mixer: * "full_attention": GroupedSelfAttention with Q-output gate and per-head Q/K RMSNorm. * "linear_attention": GatedDeltaNet.

The MLP is set by the model — subclasses override _make_mlp to swap in a MoE for the 35B-A3B variant.

Qwen3_5DecoderBlock

Bases: Module

Dense decoder block. Layer type is fixed at construction time.

Qwen3_5MoEDecoderBlock

Bases: Qwen3_5DecoderBlock

Same hybrid attention layout as the dense block; MoE FFN instead of MLP.