knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
dbplyr is the database backend for dplyr. It allows you to use remote database tables as if they are in-memory data frames by automatically converting dplyr code into SQL.
To learn more about why you might use dbplyr instead of writing SQL, see vignette("sql")
. To learn more about the details of the SQL translation, see vignette("translation-verb")
and vignette("translation-function")
.
# The easiest way to get dbplyr is to install the whole tidyverse:
install.packages("tidyverse")
# Alternatively, install just dbplyr:
install.packages("dbplyr")
# Or the development version from GitHub:
# install.packages("pak")
pak::pak("tidyverse/dbplyr")
dbplyr is designed to work with database tables as if they were local data frames. To demonstrate this I'll first create an in-memory SQLite database and copy over a dataset:
library(dplyr, warn.conflicts = FALSE)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, mtcars)
Note that you don't actually need to load dbplyr with library(dbplyr)
; dplyr automatically loads it for you when it sees you working with a database. Database connections are coordinated by the DBI package. Learn more at https://dbi.r-dbi.org/
Now you can retrieve a table using tbl()
(see ?tbl_dbi
for more details). Printing it just retrieves the first few rows:
mtcars2 <- tbl(con, "mtcars")
mtcars2
All dplyr calls are evaluated lazily, generating SQL that is only sent to the database when you request the data.
# lazily generates query
summary <- mtcars2 %>%
group_by(cyl) %>%
summarise(mpg = mean(mpg, na.rm = TRUE)) %>%
arrange(desc(mpg))
# see query
summary %>% show_query()
# execute query and retrieve results
summary %>% collect()
Please note that the dbplyr project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.