qwen_3_5

`qwen_3_5` ¶

Qwen 3.5 hybrid decoder block.

Each layer's layer_type selects the token mixer: * "full_attention": GroupedSelfAttention with Q-output gate and per-head Q/K RMSNorm. * "linear_attention": GatedDeltaNet.

The MLP is set by the model — subclasses override _make_mlp to swap in a MoE for the 35B-A3B variant.

`Qwen3_5DecoderBlock` ¶

Bases: Module

Dense decoder block. Layer type is fixed at construction time.

`Qwen3_5MoEDecoderBlock` ¶

Bases: Qwen3_5DecoderBlock

Same hybrid attention layout as the dense block; MoE FFN instead of MLP.

qwen_3_5

qwen_3_5 ¶

Qwen3_5DecoderBlock ¶

Qwen3_5MoEDecoderBlock ¶

`qwen_3_5` ¶

`Qwen3_5DecoderBlock` ¶

`Qwen3_5MoEDecoderBlock` ¶