Skip to content

pes2o

pes2o

Pes2O(config: str | None = None, split: str = 'train')

Bases: StreamingPretrainingDataset

Scientific papers from the peS2o corpus (AllenAI).

Streams gzipped JSONL shards directly from the HF repo, bypassing the deprecated loading script (removed in datasets 4.0). Each shard is downloaded via hf_hub_download (cached locally) then read line by line.