RDataGet.jl

Simple Julia library to fetch R dataset from CRAN

MIT License

Stars
3

RDataGet

RDataGet gets tabular R datasets from CRAN. It is an alternative to RDatasets.jl, working on demand, rather than bundling data.

The basic usage is similar to RDatasets.jl. You can install it as follows:

Pkg.add(url="https://github.com/frankier/RDataGet.jl.git")

After installing the RDataGet package, you can then load data sets using the dataset() function, which takes the name of a package and a data set as arguments:

using RDataGet
harman_political = dataset("psych", "Harman.political")
neuro = dataset("boot", "neuro")

Limitations

This package currently just downloads source packages from CRAN and loads its dataset into memory in Julia. It does not depend on R itself.

The package has a few limitation, some of which are caused by this design, while others could be addressed in future:

  • Does not support built-in R datasets, including the datasets package, only
    ones which can be downloaded from CRAN
  • Can only load rda/RData/csv.gz files in the data directory
    • As such it does not support packages which generate their data using a
      build script
  • Cannot get any descriptions or further documentation related to the datasets
    from Julia (maybe TODO but needs .Rd parsing)
  • Only supports getting the latest version of each package (TODO)
  • Fixed, very-limited caching strategy
    • The package index is re-downloaded every time we need to download any
      package (so as to find the latest version number) (TODO: should be by-default
      cached per session + longer caching allowed)
    • Packages are downloaded exactly once per session, after which the same data
      is reused until Julia is restarted (TODO: should be
      customisable for longer caching)