Skip to content

base

base

Mixture-of-experts feed-forward modules.

MoE

Bases: Module

Base MoE feed-forward layer with fixed-capacity expert packing.

__call__(x: jax.Array, deterministic: bool = False) -> jax.Array

Apply top-k expert routing to x of shape [B, T, H].