missRanger

Fast multivariate imputation by random forests.

GPL-2.0 License

Downloads
2.8K
Stars
61
Committers
6

Bot releases are hidden (Show)

missRanger - CRAN release 2.6.0 Latest Release

Published by mayer79 2 months ago

Major bug fix

Fixes a major bug, by which responses would be used as covariates in the random forests. Thanks for reporting @flystar233, see #78.
You can expect different and better imputations.

Major feature

Out-of-sample application is now possible! Thanks to @jeandigitale for pushing the idea in #58.

This means you can run imp <- missRanger(..., keep_forests = TRUE) and then apply its models to new data via predict(imp, newdata). The "missRanger" object can be saved/loaded as binary file, e.g, via saveRDS()/readRDS() for later use.

Note that out-of-sample imputation works best for rows in newdata with only one
missing value (counting only missings in variables used as covariates in random forests). We call this the "easy case". In the "hard case",
even multiple iterations (set by iter) can lead to unsatisfactory results.

The out-of-sample algorithm works as follows:

  1. Impute univariately all relevant columns by randomly drawing values
    from the original unimputed data. This step will only impact "hard case" rows.
  2. Replace univariate imputations by predictions of random forests. This is done
    sequentially over variables, where the variables are sorted to minimize the impact
    of univariate imputations. Optionally, this is followed by predictive mean matching (PMM).
  3. Repeat Step 2 for "hard case" rows multiple times.

Possibly breaking changes

  • Columns of special type like date/time can't be imputed anymore. You will need to convert them to numeric before imputation.
  • pmm() is more picky: xtrain and xtest must both be either numeric, logical, or factor (with identical levels).

Minor changes in output object

  • Add original data as data_raw.
  • Renamed visit_seq to to_impute.

Other changes

  • Now requires ranger >= 0.16.0.
  • More compact vignettes.
  • Better examples and README.
  • Many relevant ranger() arguments are now explicit arguments in missRanger() to improve tab-completion experience:
    • num.trees = 500
    • mtry = NULL
    • min.node.size = NULL
    • min.bucket = NULL
    • max.depth = NULL
    • replace = TRUE
    • sample.fraction = if (replace) 1 else 0.632
    • case.weights = NULL
    • num.threads = NULL
    • save.memory = FALSE
  • For variables that can't be used, more information is printed.
  • If keep_forests = TRUE, the argument data_only is set to FALSE by default.
  • "missRanger" object now stores pmm.k.
  • verbose argument is passed to ranger() as well.
missRanger - CRAN release 2.5.0

Published by mayer79 3 months ago

Bug fixes

  • Since Release 2.3.0, unintentionally, negative formula terms haven't been dropped, see #62. This is fixed now.

Enhancements

  • The vignette on multiple imputations has been revised, and a larger number of donors in predictive mean matching is being used in the example.
missRanger - CRAN release 2.4.0

Published by mayer79 11 months ago

Future Output API

  • New argument data_only = TRUE to control if only the imputed data should be returned (default), or an object of class "missRanger". This object contains the imputed data and infos like OOB prediction errors, fixing #28. The value FALSE will later becoming the default in {missRanger 3.0.0}. This will be announced via deprecation cycle.

Enhancements

  • New argument keep_forests = FALSE. Should the random forests of the best iteration (the one that generated the final imputed data) be added to the "missRanger" object? Note that this will use a lot of memory. Only relevant if data_only = FALSE. This solves #54.

Bug fixes

  • In case the algorithm did not converge, the data of the last iteration was returned instead of the current one. This has been fixed.
missRanger - CRAN release 2.3.0

Published by mayer79 12 months ago

Major improvements

  • missRanger() now works with syntactically wrong variable names like "1bad:variable". This solves an old issue, recently popping up in this new issue.
  • missRanger() now works with any number of features, as long as the formula is left at its default, i.e., . ~ .. This solves this issue.

Other changes

  • Documentation improvement.
  • ranger() is now called via the x/y interface, not the formula interface anymore.
missRanger - CRAN release 2.2.1

Published by mayer79 over 1 year ago

  • Switch from importFrom to :: code style
  • Documentation improved
missRanger - CRAN release 2.2.0

Published by mayer79 over 1 year ago

missRanger 2.2.0

Less dependencies

  • Removed {mice} from "suggested" packages.
  • Removed {dplyr} from "suggested" packages.
  • Removed {survival} from "suggested" packages.

Maintenance

  • Adding Github pages.
  • Introduction of Github actions.
missRanger - Release 2.1.5

Published by mayer79 over 2 years ago

A maintenance release, mainly improving the package structuring.

missRanger - CRAN release 2.1.3

Published by mayer79 over 3 years ago