moe

`moe` ¶

`MoE` ¶

Bases: Module

Base MoE feed-forward layer with fixed-capacity expert packing.

`call(x: jax.Array, deterministic: bool = False) -> jax.Array` ¶

Apply top-k expert routing to x of shape [B, T, H].

`BiasBalancedMoE` ¶

Bases: MoE

Batch-local approximation of DeepSeek-style loss-free balancing.

DeepSeek updates a persistent expert bias using recent routing load. To avoid threading mutable model state throughout the repo, this implementation uses the current batch's observed load to compute a one-step balancing bias and reroutes once with that bias applied.

moe

moe ¶

MoE ¶

__call__(x: jax.Array, deterministic: bool = False) -> jax.Array ¶

BiasBalancedMoE ¶

`moe` ¶

`MoE` ¶

`call(x: jax.Array, deterministic: bool = False) -> jax.Array` ¶

`BiasBalancedMoE` ¶