Bot releases are visible (Hide)
Addresses issues with R 4.4.0. The root cause was that version checking functions
changed how the work.
package_version()
no longer accepts numeric_version()
output. Wrappedpackage_version()
function to coerce the argument if it's anumeric_version
class<
, >=
, etc.) for packageVersion()
do no longer accept numeric values.Adding support for Databricks "autoloader" (format: cloudFiles
) for streaming ingestion of files(stream_read_cloudfiles
)(@zacdav-db #3432):
stream_write_table()
stream_read_table()
Made changes to stream_write_generic
(@zacdav-db #3432):
toTable
method doesn't allow calling start
, added to_table
param that adjusts logicpath
option not propagated when to_table
is TRUE
Upgrades to Roxygen version 7.3.1
Published by edgararuiz 7 months ago
Fixes quoting issue with dbplyr
2.5.0 (#3429)
Fixes Windows OS identification (#3426)
Removes dependency on tibble
, all calls are now redirected to dplyr
(#3399)
Removes dependency on rapddirs
(#3401):
sparklyr
0.5 is no longer neededConverts spark_apply()
to a method (#3418)
Spark 2.3 is no longer considered maintained as of September 2019
Updates Delta-to-Spark version matching when using delta
as one of the
packages
when connecting (#3414)
Published by edgararuiz 12 months ago
dbplyr
versionFixes db_connection_describe()
S3 consistency error (@t-kalinowski)
Addresses new error from dbplyr
that fails when you try to access
components from a remote tbl
using $
Bumps the version of dbplyr
to switch between the two methods to create
temporary tables
Addresses new translate_sql()
hard requirement to pass a con
object. Done
by passing the current connection or simulate_hive()
Small fix to spark_connect_method() arguments. Removes 'hadoop_version'
Improvements to handling pysparklyr
load (@t-kalinowski)
Fixes 'subscript out of bounds' issue found by pysparklyr
(@t-kalinowski)
Updates available Spark download links
Removes dependency on the following packages:
digest
base64enc
ellipsis
Converts ml_fit()
into a S3 method for pysparklyr
compatibility
Improvements and fixes to tests (@t-kalinowski)
Fixes test jobs that include should have included Arrow but did not
Updates to the Spark versions to be tested
Re-adds tests for development dbplyr
Published by edgararuiz about 1 year ago
Spark error message relays are now cached instead of the entire content
displayed as an R error. This used to overwhelm the interactive session's
console or Notebook, because of the amount of lines returned by the
Spark message. Now, by default, it will return the top of the Spark
error message, which is typically the most relevant part. The full error can
still be accessed using a new function called spark_last_error()
Reduces redundancy on several tests
Handles SQL quoting when the table reference contains multiple levels. The
common time someone would encounter an issue is when a table name is passed
using in_catalog()
, or in_schema()
.
It prevents an error when na.rm = TRUE
is explicitly set within pmax()
and
pmin()
. It will now also purposely fail if na.rm
is set to FALSE
. The
default of these functions in base R is for na.rm
to be FALSE
, but ever
since these functions were released, there has been no warning or error. For now,
we will keep that behavior until a better approach can be figured out. (#3353)
spark_install()
will now properly match when a partial version is passed
to the function. The issue was that passing '2.3' would match to '3.2.3', instead
of '2.3.x' (#3370)
Adds functionality to allow other packages to provide sparklyr
additional
back-ends. This effort is mainly focused on adding the ability to integrate
with Spark Connect and Databricks Connect through a new package.
New exported functions to integrate with the RStudio IDE. They all have the
same spark_ide_
prefix
Modifies several read functions to become exported methods, such as
sdf_read_column()
.
Adds spark_integ_test_skip()
function. This is to allow other packages
to use sparklyr
's test suite. It enables a way to the external package to
indicate if a given test should run or be skipped.
If installed, sparklyr
will load the pysparklyr
package
Published by edgararuiz over 1 year ago
Adds Azure Synapse Analytics connectivity (@Bob-Chou , #3336)
Adds support for "parameterized" queries now available in Spark 3.4 (@gregleleu #3335)
Adds new DBI methods: dbValid
and dbDisconnect
(@alibell, #3296)
Adds overwrite
parameter to dbWriteTable()
(@alibell, #3296)
Adds database
parameter to dbListTables()
(@alibell, #3296)
Adds ability to turn off predicate support (where(), across()) using
options("sparklyr.support.predicates" = FALSE). Defaults to TRUE. This should
accelerate dplyr
commands because it won't need to process column types
for every single piped command
Fixes Spark download locations (#3331)
Fix various rlang deprecation warnings (@mgirlich, #3333).
Published by edgararuiz over 1 year ago
Published by edgararuiz over 1 year ago
Addresses Warning from CRAN checks
Addresses option(stringsAsFactors) usage
Fixes root cause of issue processing pivot wider and distinct (#3317 & #3320)
Updates local Spark download sources
Published by edgararuiz about 2 years ago
Adds new metric extraction functions: ml_metrics_binary()
,
ml_metrics_regression()
and ml_metrics_multiclass()
. They work closer to
how yardstick
metric extraction functions work. They expect a table with
the predictions and actual values, and returns a concise tibble
with the
metrics. (#3281)
Adds new spark_insert_table()
function. This allows one to insert data into
an existing table definition without redefining the table, even when overwriting
the existing data. (#3272 @jimhester)
ml_cross_validator()
for regression models. (#3273)Adds support to Spark 3.3 local installation. This includes the ability to
enable and setup log4j version 2. (#3269)
Updates the JSON file that sparklyr
uses to find and download Spark for
local use. It is worth mentioning that starting with Spark 3.3, the Hadoop
version number is no longer using a minor version for its download link. So,
instead of requesting 3.2, the version to request is 3.
Removes workaround for older versions of arrow
. Bumps arrow
version
dependency, from 0.14.0 to 0.17.0 (#3283 @nealrichardson)
Removes code related to backwards compatibility with dbplyr
. sparklyr
requires dbplyr
version 2.2.1 or above, so the code is no longer needed.
(#3277)
Begins centralizing ML parameter validation into a single function that will
run the proper cast
function for each Spark parameter. It also starts using
S3 methods, instead of searching for a concatenated function name, to find the
proper parameter validator. Regression models are the first ones to use this
new method. (#3279)
sparklyr
compilation routines have been improved and simplified.
spark_compile()
now provides more informative output when used. It also adds
tests to compilation to make sure. It also adds a step to install Scala in the
corresponding GHAs. This is so that the new JAR build tests are able to run.
(#3275)
Stops using package environment variables directly. Any package level variable
will be handled by a genv
prefixed function to set and retrieve values. This
avoids the risk of having the exact same variable initialized on more than on
R script. (#3274)
Adds more tests to improve coverage.
Published by edgararuiz over 2 years ago
dbplyr
version (@mgirlich)stringr
dependencyaugment()
when the model was fitted via parsnip
(#3233)Published by edgararuiz over 2 years ago
rlang::is_env()
pivot_wider()
S3 consistency issuePublished by edgararuiz over 2 years ago
Published by yitao-li almost 3 years ago
Implemented support for the .groups
parameter for dplyr::summarize()
operations on Spark dataframes
Fixed the incorrect handling of the remove = TRUE
option for
separate.tbl_spark()
Optimized away an extra count query when collecting Spark dataframes from
Spark to R.
By default, use links from the https://dlcdn.apache.org site for downloading
Apache Spark when possible.
Attempt to continue spark_install()
process even if the Spark version
specified is not present in inst/extdata/versions*.json
files (in which
case sparklyr
will guess the URL of the tar ball based on the existing
and well-known naming convention used by https://archive.apache.org, i.e.,
https://archive.apache.org/dist/spark/spark-${spark version}/spark-${spark version}-bin-hadoop${hadoop version}.tgz)
Revised inst/extdata/versions*.json
files to reflect recent releases of
Apache Spark.
Implemented sparklyr_get_backend_port()
for querying the port number used
by the sparklyr
backend.
Published by yitao-li about 3 years ago
Published by yitao-li over 3 years ago
Published by yitao-li almost 4 years ago
Published by yitao-li about 4 years ago
Published by yitao-li over 4 years ago