NumpyFileDataset¶

class axtreme.data.numpy_file_dataset.NumpyFileDataset(root_dir: str | Path)¶

Bases: Dataset[Tensor]

Helper to work with directories of .npy data.

Note

Highly recommened to use an in memory dataset if possible. This is typically a bottleneck.
Using with a sequential sampler will be significantly faster because this performs rudimental cacheing.
- Random sampling will require from disk read for EVERY datapoint. Suggest randomise the save files.

Assumes:

Dev:

Todo

Answered:

Is Sampler a more approapriate way of framing this?
- No. The samples class just take an existing dataset and shuffles it. Like shuffle in dataloader.

__init__(root_dir: str | Path) → None¶

Initialise the Dataset.

Note

Data should be loaded lazily (in __getitem__, not here)

Methods

Initialise the Dataset.