optimizers

`optimizers` ¶

Core Muon gradient transformation.

Applies three sequential operations to each update

Parameters with fewer than 2 dimensions receive only the momentum step.

Parameters:

Name	Type	Description	Default
`momentum`	`float`	EMA coefficient for the first-moment buffer.	`0.95`
`ns_steps`	`int`	Number of Polar Express iterations (1–5; 5 recommended).	`5`
`beta2`	`float`	EMA coefficient for the factored second-moment buffer.	`0.95`

Returns:

Name	Type	Description
`An`	`GradientTransformation`	class:`optax.GradientTransformation`.