Skip to content

arithmetic

arithmetic

EleutherAI/arithmetic rollout evaluation.

The dataset is stored as one JSONL file per arithmetic task on the Hub. We load all files into one evaluation, strip the Q:/A: framing, wrap the math question in a chat template, and grade the assistant's first numeric token against the ground-truth integer answer.

ArithmeticEval()

Bases: RolloutEvaluation

EleutherAI/arithmetic rollout evaluation.

load_arithmetic_dataset() -> Any

Load all arithmetic JSONL files from the Hub as one dataset.

The packaged EleutherAI/arithmetic loading script is not usable on current datasets releases, so this uses the JSON builder directly.