mtob
mtob
¶
MTOB(split: str = 'train', config: str | None = 'en-kgv')
¶
Bases: ChatTemplateDataset
MTOB: Machine Translation from One Book (Grammar-Book benchmark).
Translation between English and Kalamang (an extremely low-resource language with <200 speakers) using grammar book reference materials. Data downloaded from the official GitHub repository.
Config selects the subset
"en-kgv"(default): English -> Kalamang translation pairs"kgv-en": Kalamang -> English translation pairs"dictionary": Kalamang -> English dictionary entries"grammar"/"grammar-long"/"grammar-full": Grammar book content for pretraining/context