arithmetic
arithmetic
¶
EleutherAI/arithmetic rollout evaluation.
The dataset is stored as one JSONL file per arithmetic task on the Hub. We load
all files into one evaluation, strip the Q:/A: framing, wrap the math
question in a chat template, and grade the assistant's first numeric token
against the ground-truth integer answer.