Fast multivariate imputation by random forests.
GPL-2.0 License
Bot releases are hidden (Show)
Fixes a major bug, by which responses would be used as covariates in the random forests. Thanks for reporting @flystar233, see #78.
You can expect different and better imputations.
Out-of-sample application is now possible! Thanks to @jeandigitale for pushing the idea in #58.
This means you can run imp <- missRanger(..., keep_forests = TRUE)
and then apply its models to new data via predict(imp, newdata)
. The "missRanger" object can be saved/loaded as binary file, e.g, via saveRDS()
/readRDS()
for later use.
Note that out-of-sample imputation works best for rows in newdata
with only one
missing value (counting only missings in variables used as covariates in random forests). We call this the "easy case". In the "hard case",
even multiple iterations (set by iter
) can lead to unsatisfactory results.
The out-of-sample algorithm works as follows:
pmm()
is more picky: xtrain
and xtest
must both be either numeric, logical, or factor (with identical levels).data_raw
.visit_seq
to to_impute
.ranger()
arguments are now explicit arguments in missRanger()
to improve tab-completion experience:
keep_forests = TRUE
, the argument data_only
is set to FALSE
by default.pmm.k
.verbose
argument is passed to ranger()
as well.Published by mayer79 3 months ago
Published by mayer79 11 months ago
data_only = TRUE
to control if only the imputed data should be returned (default), or an object of class "missRanger". This object contains the imputed data and infos like OOB prediction errors, fixing #28. The value FALSE
will later becoming the default in {missRanger 3.0.0}. This will be announced via deprecation cycle.keep_forests = FALSE
. Should the random forests of the best iteration (the one that generated the final imputed data) be added to the "missRanger" object? Note that this will use a lot of memory. Only relevant if data_only = FALSE
. This solves #54.Published by mayer79 12 months ago
missRanger()
now works with syntactically wrong variable names like "1bad:variable". This solves an old issue, recently popping up in this new issue.missRanger()
now works with any number of features, as long as the formula is left at its default, i.e., . ~ .
. This solves this issue.ranger()
is now called via the x/y interface, not the formula interface anymore.Published by mayer79 over 1 year ago
importFrom
to ::
code stylePublished by mayer79 over 1 year ago
Published by mayer79 over 2 years ago
A maintenance release, mainly improving the package structuring.
Published by mayer79 over 3 years ago