moe
moe
¶
MoE
¶
BiasBalancedMoE
¶
Bases: MoE
Batch-local approximation of DeepSeek-style loss-free balancing.
DeepSeek updates a persistent expert bias using recent routing load. To avoid threading mutable model state throughout the repo, this implementation uses the current batch's observed load to compute a one-step balancing bias and reroutes once with that bias applied.