Skip to content

pes2o

pes2o

Pes2O(config: str | None = None, split: str = 'train')

Bases: StreamingPretrainingDataset

Scientific papers from the peS2o corpus (AllenAI).

Streams full-text academic papers derived from the Semantic Scholar Open Research Corpus. Uses allenai/peS2o via parquet auto-convert to bypass deprecated custom loading scripts.