imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

MIT License

Downloads
17.9M
Stars
6.7K
Committers
77

Bot releases are hidden (Show)

imbalanced-learn - Imbalanced-learn 0.12.3 Latest Release

Published by glemaitre 5 months ago

Changelog

Compatibility

imbalanced-learn - Imbalanced-learn 0.12.2

Published by glemaitre 7 months ago

Changelog

Bug fixes

imbalanced-learn - Imbalanced-learn 0.12.1

Published by glemaitre 7 months ago

Changelog

Bug fixes

Compatibility

imbalanced-learn - Imbalanced-learn 0.12.0

Published by glemaitre 9 months ago

Changelog

Bug fixes

Compatibility

Deprecations

Enhancements

  • Allows to output dataframe with sparse format if provided as input. #1059 by ts2095.
imbalanced-learn - imbalanced-learn 0.11.0

Published by glemaitre over 1 year ago

Changelog

Bug fixes

Compatibility

Deprecation

Enhancements

  • SMOTENC now accepts a parameter categorical_encoder allowing to specify a OneHotEncoder with custom parameters. #1000 by Guillaume Lemaitre.

  • SMOTEN now accepts a parameter categorical_encoder allowing to specify a OrdinalEncoder with custom parameters. A new fitted parameter categorical_encoder_ is exposed to access the fitted encoder. #1001 by Guillaume Lemaitre.

  • RandomUnderSampler and RandomOverSampler (when shrinkage is not None) now accept any data types and will not attempt any data conversion. #1004 by Guillaume Lemaitre.

  • SMOTENC now support passing array-like of str when passing the categorical_features parameter. #1008 by :userGuillaume Lemaitre <glemaitre>.

  • SMOTENC now support automatic categorical inference when categorical_features is set to "auto". #1009 by :userGuillaume Lemaitre <glemaitre>.

imbalanced-learn - imbalanced-learn 0.10.1

Published by glemaitre over 1 year ago

Changelog

Bug fixes

  • Fix a regression in over-sampler where the string minority was rejected as an unvalid sampling strategy. #964 by Prakhyath07.
imbalanced-learn - imbalanced-learn 0.10.0

Published by glemaitre almost 2 years ago

Changelog

Bug fixes

  • Make sure that Substitution is working with python -OO that replaces doc by None. #953 bu Guillaume Lemaitre.

Compatibility

Deprecation

Enhancements

  • Add support to accept compatible NearestNeighbors objects by only duck-typing. For instance, it allows to accept cuML instances. #858 by NV-jpt and Guillaume Lemaitre.
imbalanced-learn - Version 0.9.1

Published by glemaitre over 2 years ago

Compatibility with scikit-learn 1.1.0

imbalanced-learn - Version 0.9.0

Published by glemaitre almost 3 years ago

Compatibility with scikit-learn 1.0.2

imbalanced-learn - Version 0.8.1

Published by glemaitre about 3 years ago

Version 0.8.1

September 29, 2021

Maintenance

Make imbalanced-learn compatible with scikit-learn 1.0. #864 by Guillaume Lemaitre.

imbalanced-learn - Version 0.8.0

Published by glemaitre over 3 years ago

Version 0.8.0

February 18, 2021

Changelog

New features

  • Add the the function imblearn.metrics.macro_averaged_mean_absolute_error returning the average across class of the MAE. This metric is used in ordinal classification. #780 by Aurélien Massiot.
  • Add the class imblearn.metrics.pairwise.ValueDifferenceMetric to compute pairwise distances between samples containing only categorical values. #796 by Guillaume Lemaitre.
  • Add the class imblearn.over_sampling.SMOTEN to over-sample data only containing categorical features. #802 by Guillaume Lemaitre.
  • Add the possibility to pass any type of samplers in imblearn.ensemble.BalancedBaggingClassifier unlocking the implementation of methods based on resampled bagging. #808 by Guillaume Lemaitre.

Enhancements

  • Add option output_dict in imblearn.metrics.classification_report_imbalanced to return a dictionary instead of a string. #770 by Guillaume Lemaitre.
  • Added an option to generate smoothed bootstrap in `imblearn.over_sampling.RandomOverSampler. It is controled by the parameter shrinkage. This method is also known as Random Over-Sampling Examples (ROSE). #754 by Andrea Lorenzon and Guillaume Lemaitre.

Bug fixes

  • Fix a bug in imblearn.under_sampling.ClusterCentroids where voting="hard" could have lead to select a sample from any class instead of the targeted class. #769 by Guillaume Lemaitre.
  • Fix a bug in imblearn.FunctionSampler where validation was performed even with validate=False when calling fit. #790 by Guillaume Lemaitre.

Maintenance

  • Remove requirements files in favour of adding the packages in the extras_require within the setup.py file. #816 by Guillaume Lemaitre.
  • Change the website template to use pydata-sphinx-theme. #801 by Guillaume Lemaitre.

Deprecation

  • The context manager imblearn.utils.testing.warns is deprecated in 0.8 and will be removed 1.0. #815 by Guillaume Lemaitre.
imbalanced-learn - Version 0.7.0

Published by glemaitre over 4 years ago

A release to bump the minimum version of scikit-learn to 0.23 with a couple of bug fixes.
Check the what's new for more information.

imbalanced-learn - Version 0.6.2

Published by glemaitre over 4 years ago

This is a bug-fix release to resolve some issues regarding the handling the input and the output format of the arrays.

Changelog

  • Allow column vectors to be passed as targets. #673 by @chkoar.
  • Better input/output handling for pandas, numpy and plain lists. #681 by @chkoar.
imbalanced-learn - Version 0.6.1

Published by glemaitre almost 5 years ago

This is a bug-fix release to primarily resolve some packaging issues in version 0.6.0. It also includes minor documentation improvements and some bug fixes.

Changelog

Bug fixes

  • Fix a bug in :class:imblearn.ensemble.BalancedRandomForestClassifier leading to a wrong number of samples used during fitting due max_samples and therefore a bad computation of the OOB score. :pr:656 by :user:Guillaume Lemaitre <glemaitre>.
imbalanced-learn - Version 0.6.0

Published by glemaitre almost 5 years ago

Changelog

Changed models
..............

The following models might give some different sampling due to changes in
scikit-learn:

  • :class:imblearn.under_sampling.ClusterCentroids
  • :class:imblearn.under_sampling.InstanceHardnessThreshold

The following samplers will give different results due to change linked to
the random state internal usage:

  • :class:imblearn.over_sampling.SMOTENC

Bug fixes
.........

  • :class:imblearn.under_sampling.InstanceHardnessThreshold now take into
    account the random_state and will give deterministic results. In addition,
    cross_val_predict is used to take advantage of the parallelism.
    :pr:599 by :user:Shihab Shahriar Khan <Shihab-Shahriar>.

  • Fix a bug in :class:imblearn.ensemble.BalancedRandomForestClassifier
    leading to a wrong computation of the OOB score.
    :pr:656 by :user:Guillaume Lemaitre <glemaitre>.

Maintenance
...........

  • Update imports from scikit-learn after that some modules have been privatize.
    The following import have been changed:
    :class:sklearn.ensemble._base._set_random_states,
    :class:sklearn.ensemble._forest._parallel_build_trees,
    :class:sklearn.metrics._classification._check_targets,
    :class:sklearn.metrics._classification._prf_divide,
    :class:sklearn.utils.Bunch,
    :class:sklearn.utils._safe_indexing,
    :class:sklearn.utils._testing.assert_allclose,
    :class:sklearn.utils._testing.assert_array_equal,
    :class:sklearn.utils._testing.SkipTest.
    :pr:617 by :user:Guillaume Lemaitre <glemaitre>.

  • Synchronize :mod:imblearn.pipeline with :mod:sklearn.pipeline.
    :pr:620 by :user:Guillaume Lemaitre <glemaitre>.

  • Synchronize :class:imblearn.ensemble.BalancedRandomForestClassifier and add
    parameters max_samples and ccp_alpha.
    :pr:621 by :user:Guillaume Lemaitre <glemaitre>.

Enhancement
...........

  • :class:imblearn.under_sampling.RandomUnderSampling,
    :class:imblearn.over_sampling.RandomOverSampling,
    :class:imblearn.datasets.make_imbalance accepts Pandas DataFrame in and
    will output Pandas DataFrame. Similarly, it will accepts Pandas Series in and
    will output Pandas Series.
    :pr:636 by :user:Guillaume Lemaitre <glemaitre>.

  • :class:imblearn.FunctionSampler accepts a parameter validate allowing
    to check or not the input X and y.
    :pr:637 by :user:Guillaume Lemaitre <glemaitre>.

  • :class:imblearn.under_sampling.RandomUnderSampler,
    :class:imblearn.over_sampling.RandomOverSampler can resample when non
    finite values are present in X.
    :pr:643 by :user:Guillaume Lemaitre <glemaitre>.

  • All samplers will output a Pandas DataFrame if a Pandas DataFrame was given
    as an input.
    :pr:644 by :user:Guillaume Lemaitre <glemaitre>.

  • The samples generation in
    :class:imblearn.over_sampling.SMOTE,
    :class:imblearn.over_sampling.BorderlineSMOTE,
    :class:imblearn.over_sampling.SVMSMOTE,
    :class:imblearn.over_sampling.KMeansSMOTE,
    :class:imblearn.over_sampling.SMOTENC is now vectorize with giving
    an additional speed-up when X in sparse.
    :pr:596 by :user:Matt Eding <MattEding>.

Deprecation
...........

  • The following classes have been removed after 2 deprecation cycles:
    ensemble.BalanceCascade and ensemble.EasyEnsemble.
    :pr:617 by :user:Guillaume Lemaitre <glemaitre>.

  • The following functions have been removed after 2 deprecation cycles:
    utils.check_ratio.
    :pr:617 by :user:Guillaume Lemaitre <glemaitre>.

  • The parameter ratio and return_indices has been removed from all
    samplers.
    :pr:617 by :user:Guillaume Lemaitre <glemaitre>.

  • The parameters m_neighbors, out_step, kind, svm_estimator
    have been removed from the :class:imblearn.over_sampling.SMOTE.
    :pr:617 by :user:Guillaume Lemaitre <glemaitre>.

imbalanced-learn - 0.5.0

Published by glemaitre over 5 years ago

Version 0.5.0

Changed models

The following models or function might give different results even if the
same data X and y are the same.

  • :class:imblearn.ensemble.RUSBoostClassifier default estimator changed from
    :class:sklearn.tree.DecisionTreeClassifier with full depth to a decision
    stump (i.e., tree with max_depth=1).

Documentation

  • Correct the definition of the ratio when using a float in sampling
    strategy for the over-sampling and under-sampling.
    :issue:525 by :user:Ariel Rossanigo <arielrossanigo>.

  • Add :class:imblearn.over_sampling.BorderlineSMOTE and
    :class:imblearn.over_sampling.SVMSMOTE in the API documenation.
    :issue:530 by :user:Guillaume Lemaitre <glemaitre>.

Enhancement

  • Add Parallelisation for SMOTEENN and SMOTETomek.
    :pr:547 by :user:Michael Hsieh <Microsheep>.

  • Add :class:imblearn.utils._show_versions. Updated the contribution guide
    and issue template showing how to print system and dependency information
    from the command line. :pr:557 by :user:Alexander L. Hayes <batflyer>.

  • Add :class:imblearn.over_sampling.KMeansSMOTE which is an over-sampler
    clustering points before to apply SMOTE.
    :pr:435 by :user:Stephan Heijl <StephanHeijl>.

Maintenance

  • Make it possible to import imblearn and access submodule.
    :pr:500 by :user:Guillaume Lemaitre <glemaitre>.

  • Remove support for Python 2, remove deprecation warning from
    scikit-learn 0.21.
    :pr:576 by :user:Guillaume Lemaitre <glemaitre>.

Bug

  • Fix wrong usage of :class:keras.layers.BatchNormalization in
    porto_seguro_keras_under_sampling.py example. The batch normalization
    was moved before the activation function and the bias was removed from the
    dense layer.
    :pr:531 by :user:Guillaume Lemaitre <glemaitre>.

  • Fix bug which converting to COO format sparse when stacking the matrices in
    :class:imblearn.over_sampling.SMOTENC. This bug was only old scipy version.
    :pr:539 by :user:Guillaume Lemaitre <glemaitre>.

  • Fix bug in :class:imblearn.pipeline.Pipeline where None could be the final
    estimator.
    :pr:554 by :user:Oliver Rausch <orausch>.

  • Fix bug in :class:imblearn.over_sampling.SVMSMOTE and
    :class:imblearn.over_sampling.BorderlineSMOTE where the default parameter
    of n_neighbors was not set properly.
    :pr:578 by :user:Guillaume Lemaitre <glemaitre>.

  • Fix bug by changing the default depth in
    :class:imblearn.ensemble.RUSBoostClassifier to get a decision stump as a
    weak learner as in the original paper.
    :pr:545 by :user:Christos Aridas <chkoar>.

  • Allow to import keras directly from tensorflow in the
    :mod:imblearn.keras.
    :pr:531 by :user:Guillaume Lemaitre <glemaitre>.

imbalanced-learn - 0.4.3

Published by glemaitre almost 6 years ago

Mainly bugfix in SMOTE NC

imbalanced-learn - 0.4.2

Published by glemaitre almost 6 years ago

Version 0.4.2

Bug fixes

  • Fix a bug in imblearn.over_sampling.SMOTENC in which the the median of the standard deviation instead of half of the median of the standard deviation. By Guillaume Lemaitre in #491.
  • Raise an error when passing target which is not supported, i.e. regression target or multilabel targets. Imbalanced-learn does not support this case. By Guillaume Lemaitre in #490.
imbalanced-learn - 0.4.1

Published by glemaitre about 6 years ago

Version 0.4

October, 2018

Version 0.4 is the last version of imbalanced-learn to support Python 2.7
and Python 3.4. Imbalanced-learn 0.5 will require Python 3.5 or higher.

Highlights

This release brings its set of new feature as well as some API changes to
strengthen the foundation of imbalanced-learn.

As new feature, 2 new modules imblearn.keras and
imblearn.tensorflow have been added in which imbalanced-learn samplers
can be used to generate balanced mini-batches.

The module imblearn.ensemble has been consolidated with new classifier:
imblearn.ensemble.BalancedRandomForestClassifier,
imblearn.ensemble.EasyEnsembleClassifier,
imblearn.ensemble.RUSBoostClassifier.

Support for string has been added in
imblearn.over_sampling.RandomOverSampler and
imblearn.under_sampling.RandomUnderSampler. In addition, a new class
imblearn.over_sampling.SMOTENC allows to generate sample with data
sets containing both continuous and categorical features.

The imblearn.over_sampling.SMOTE has been simplified and break down
to 2 additional classes:
imblearn.over_sampling.SVMSMOTE and
imblearn.over_sampling.BorderlineSMOTE.

There is also some changes regarding the API:
the parameter sampling_strategy has been introduced to replace the
ratio parameter. In addition, the return_indices argument has been
deprecated and all samplers will exposed a sample_indices_ whenever this is
possible.

imbalanced-learn - 0.4.0

Published by glemaitre about 6 years ago

Version 0.4

October, 2018

.. warning::

Version 0.4 is the last version of imbalanced-learn to support Python 2.7
and Python 3.4. Imbalanced-learn 0.5 will require Python 3.5 or higher.

Highlights

This release brings its set of new feature as well as some API changes to
strengthen the foundation of imbalanced-learn.

As new feature, 2 new modules imblearn.keras and
imblearn.tensorflow have been added in which imbalanced-learn samplers
can be used to generate balanced mini-batches.

The module imblearn.ensemble has been consolidated with new classifier:
imblearn.ensemble.BalancedRandomForestClassifier,
imblearn.ensemble.EasyEnsembleClassifier,
imblearn.ensemble.RUSBoostClassifier.

Support for string has been added in
imblearn.over_sampling.RandomOverSampler and
imblearn.under_sampling.RandomUnderSampler. In addition, a new class
imblearn.over_sampling.SMOTENC allows to generate sample with data
sets containing both continuous and categorical features.

The imblearn.over_sampling.SMOTE has been simplified and break down
to 2 additional classes:
imblearn.over_sampling.SVMSMOTE and
imblearn.over_sampling.BorderlineSMOTE.

There is also some changes regarding the API:
the parameter sampling_strategy has been introduced to replace the
ratio parameter. In addition, the return_indices argument has been
deprecated and all samplers will exposed a sample_indices_ whenever this is
possible.

Package Rankings
Top 0.56% on Pypi.org
Top 15.37% on Spack.io
Top 4.1% on Alpine-v3.17
Top 19.71% on Anaconda.org
Top 4.69% on Conda-forge.org
Related Projects