pes2o
pes2o
¶
Pes2O(config: str | None = None, split: str = 'train')
¶
Bases: StreamingPretrainingDataset
Scientific papers from the peS2o corpus (AllenAI).
Streams gzipped JSONL shards directly from the HF repo, bypassing the
deprecated loading script (removed in datasets 4.0). Each shard is
downloaded via hf_hub_download (cached locally) then read line by line.