Call Rust from R for geo data
MIT License
This repo currently experimental code and a minimal R package to test calling Rust code from R.
The code examples are all documented in this README to keep things simple. To reproduce these examples (and to begin hacking R/Rust code!) you will need to have installed:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
sf
rextendr
, the development version of whichremotes::install_github("extendr/rextendr")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'rextendr' from a github remote, the SHA1 (bb6b9f1f) has not changed since last install.
#> Use `force = TRUE` to force installation
After you have installed these things you can clone the repo and open the director in an editor of your choice, e.g.with
rstudio georustr/georustr.Rproj # open it in RStudio
# or...
code -r georustr/ # open it in VS Code
code -r georustr/src/rust # open the rust crate in VS Code
Rebuild it from the root directory with the following command from the R command line:
rextendr::document()
#> Generating extendr wrapper functions for package: georustr.
#> 'R/extendr-wrappers.R' is up-to-date. Skip generating wrapper functions.
#> Updating georustr documentation
#> Loading georustr
#> Warning: [/mnt/57982e2a-2874-4246-a6fe-115c199bc6bd/orgs/robinlovelace/georustr/
#> R/csv_to_geojson.R:9] @examples requires a value
#> Writing NAMESPACE
#> Writing NAMESPACE
Load it with:
devtools::load_all()
#> Loading georustr
You can download the test data with the following command:
u = "https://github.com/Robinlovelace/georustr/releases/download/v0.0.0.9000/points.csv"
f = basename(u)
if(!file.exists(f)) {
download.file(url = u, destfile = f)
}
After that the following should work:
n = 1e5
points_df = data.frame(x = rnorm(n = n), y = rnorm(n))
system.time({
points_sf = sf::st_as_sf(points_df, coords = c("x", "y"), crs = 4326)
})
#> user system elapsed
#> 0.037 0.008 0.046
We can do the full csv to geojson process for a fair test as follows:
# run once
readr::write_csv(points_df, "points.csv")
if(file.exists("points.geojson")) file.remove("points.geojson")
#> [1] TRUE
system.time({
csv_to_json_base_r(file_csv = "points.csv")
})
#> Writing output to points.geojson
#> Writing layer `points' to data source `points.geojson' using driver `GeoJSON'
#> Writing 100000 features with 0 fields and geometry type Point.
#> user system elapsed
#> 0.861 0.016 0.880
file.exists("points.geojson")
#> [1] TRUE
if(file.exists("points_rust.geojson")) {
file.remove("points_rust.geojson")
}
#> [1] TRUE
system.time({
csv_to_geojson_rust()
})
#> user system elapsed
#> 0.281 0.048 0.330
file.exists("points_rust.geojson")
#> [1] TRUE
The results show that, for this simple test, Rust is more than 2 times faster than equivalent R/GDAL code. Depending on your application, much greater speed-ups should be possible but calling Rust code from R.
You can also run the code from the system command line:
cd src/rust
cargo test --release
Verify the time taken to run as follows:
time cargo test --release
Check the outputs are the same (they are):
sf::read_sf("points.geojson")
sf::read_sf("points_rust.geojson")