pes2o
pes2o
¶
Pes2O(config: str | None = None, split: str = 'train')
¶
Bases: StreamingPretrainingDataset
Scientific papers from the peS2o corpus (AllenAI).
Streams full-text academic papers derived from the Semantic Scholar
Open Research Corpus. Uses allenai/peS2o via parquet auto-convert
to bypass deprecated custom loading scripts.