Skip to content

mtob

mtob

MTOB(split: str = 'train', config: str | None = 'en-kgv')

Bases: ChatTemplateDataset

MTOB: Machine Translation from One Book (Grammar-Book benchmark).

Translation between English and Kalamang (an extremely low-resource language with <200 speakers) using grammar book reference materials. Data downloaded from the official GitHub repository.

Config selects the subset
  • "en-kgv" (default): English -> Kalamang translation pairs
  • "kgv-en": Kalamang -> English translation pairs
  • "dictionary": Kalamang -> English dictionary entries
  • "grammar" / "grammar-long" / "grammar-full": Grammar book content for pretraining/context