Skip to content

perplexity_evals

perplexity_evals

Perplexity evaluations for existing datasets with validation splits.

These complement the existing RolloutEvaluation counterparts by measuring how well the model predicts validation-set tokens (1/perplexity, higher is better), which is especially useful for tracking forgetting in continual learning experiments.

MNLIPerplexityEval(num_samples: int = 500)

Bases: PerplexityEvaluation

Perplexity on MNLI validation_matched split.

QQPPerplexityEval(num_samples: int = 500)

Bases: PerplexityEvaluation

Perplexity on QQP validation split.

SST2PerplexityEval(num_samples: int = 500)

Bases: PerplexityEvaluation

Perplexity on SST-2 validation split.

SIQAPerplexityEval(num_samples: int = 500)

Bases: PerplexityEvaluation

Perplexity on Social IQa validation split.

WinograndePerplexityEval(num_samples: int = 500)

Bases: PerplexityEvaluation

Perplexity on Winogrande validation split.

FineWebPerplexityEval(num_samples: int = 500)

Bases: PerplexityEvaluation

Perplexity on a sample from FineWeb.