Skip to content

math

math

MATH benchmark rollout evaluation (Hendrycks et al., 2021).

Competition-math problems with LaTeX answers. The dataset's solution field ends with a \boxed{...} containing the final answer; we extract that as ground truth, and grade rollouts by extracting their final \boxed{...}.

MathEval()

Bases: RolloutEvaluation

MATH benchmark rollout evaluation (test split).