Skip to content

pmd

pmd

pmd.py "Poor Man's Dataloader" Datasets

MemmapDataset(spec: ExecutionSpec, block_size: int, name: str, suffix: str = '')

Bases: Dataset

get_batch(batch_size: int, split: str = 'train', deterministic_key: Optional[int] = None) -> Dict[str, np.ndarray]

Get batches using cache-optimal sequential block reads.