Skip to content

moe

moe

MoE

Bases: Module

Base MoE feed-forward layer with fixed-capacity expert packing.

__call__(x: jax.Array, deterministic: bool = False) -> jax.Array

Apply top-k expert routing to x of shape [B, T, H].

BiasBalancedMoE

Bases: MoE

Batch-local approximation of DeepSeek-style loss-free balancing.

DeepSeek updates a persistent expert bias using recent routing load. To avoid threading mutable model state throughout the repo, this implementation uses the current batch's observed load to compute a one-step balancing bias and reroutes once with that bias applied.