Skip to content

gsm8k

gsm8k

GSM8K rollout evaluation (Cobbe et al., 2021).

Grade-school math word problems. Ground-truth answer follows a #### <int> sentinel in the dataset's answer field. Models are graded by extracting either the final \boxed{...}, a final #### N line, or the last integer in the rollout.

GSM8KEval()

Bases: RolloutEvaluation

GSM8K rollout evaluation (test split).