pile_detoxify
pile_detoxify
¶
PileDetoxify(config: str | None = None, split: str = 'train')
¶
Bases: StreamingPretrainingDataset
Filtered Pile with toxicity scores (Korbak et al.).
Streams text from tomekkorbak/pile-detoxify, which annotates
Pile documents with per-sentence toxicity scores from Detoxify.
Each yielded string is the full document text (sentences joined).