Skip to content

blimp

blimp

BLiMP (Benchmark of Linguistic Minimal Pairs) evaluation.

Tests grammatical knowledge via minimal pair acceptability judgments.

Blimp(subset: str | None = None)

Bases: PerplexityComparisonEvaluation

BLiMP evaluation using perplexity comparison.

Each sample contains a grammatically correct and incorrect sentence. The model should assign lower perplexity to the correct sentence.

get(indx: int) -> Tuple[str, list[str], int]

Get sample at index.

Returns:

Type Description
Tuple[str, list[str], int]

(prefix, list_of_continuations, correct_index)