bias_balanced
bias_balanced
¶
Bias-balanced MoE routing.
BiasBalancedMoE
¶
Bases: MoE
Batch-local approximation of DeepSeek-style loss-free balancing.
DeepSeek updates a persistent expert bias using recent routing load. To avoid threading mutable model state throughout the repo, this implementation uses the current batch's observed load to compute a one-step balancing bias and reroutes once with that bias applied.