Dataset Containers
MLDatasets.jl contains several reusable data containers for accessing datasets in common storage formats. This feature is a work-in-progress and subject to change.
MLDatasets.FileDataset — TypeFileDataset([loadfn = FileIO.load,] paths)
FileDataset([loadfn = FileIO.load,] dir, pattern = "*", depth = 4)Wrap a set of file paths as a dataset (traversed in the same order as paths). Alternatively, specify a dir and collect all paths that match a glob pattern (recursively globbing by depth). The glob order determines the traversal order.
MLDatasets.CachedDataset — TypeCachedDataset(source, cachesize = numbobs(source))
CachedDataset(source, cacheidx = 1:numbobs(source))
CachedDataset(source, cacheidx, cache)Wrap a source data container and cache cachesize samples in memory. This can be useful for improving read speeds when source is a lazy data container, but your system memory is large enough to store a sizeable chunk of it.
By default the observation indices 1:cachesize are cached. You can manually pass in a set of cacheidx as well.
See also make_cache for customizing the default cache creation for source.
MLDatasets.make_cache — Functionmake_cache(source, cacheidx)Return a in-memory copy of source at observation indices cacheidx. Defaults to getobs(source, cacheidx).