Skip to content

bias_balanced

bias_balanced

Bias-balanced MoE routing.

BiasBalancedMoE

Bases: MoE

Batch-local approximation of DeepSeek-style loss-free balancing.

DeepSeek updates a persistent expert bias using recent routing load. To avoid threading mutable model state throughout the repo, this implementation uses the current batch's observed load to compute a one-step balancing bias and reroutes once with that bias applied.