A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
MIT License
Bot releases are hidden (Show)
Published by glemaitre 7 months ago
Published by glemaitre 7 months ago
Published by glemaitre 9 months ago
Published by glemaitre over 1 year ago
Fix a bug in classification_report_imbalanced where the parameter target_names
was not taken into account when output_dict=True
. #989 by AYY7.
SMOTENC now handles mix types of data type such as bool and pd.CategoricalDtype
by delegating the conversion to scikit-learn encoder. #1002 by Guillaume Lemaitre.
Handle sparse matrices in SMOTEN and raise a warning since it requires a conversion to dense matrices. #1003 by Guillaume Lemaitre.
Remove spurious warning raised when minority class get over-sampled more than the number of sample in the majority class. #1007 by Guillaume Lemaitre.
The fitted attribute ohe_
in SMOTENC is deprecated and will be removed in version 0.13. Use categorical_encoder_
instead. #1000 by Guillaume Lemaitre.
The default of the parameters sampling_strategy
and replacement will change in BalancedRandomForestClassifier to follow the implementation of the original paper. This changes will take effect in version 0.13. #1006 by Guillaume Lemaitre.
SMOTENC now accepts a parameter categorical_encoder
allowing to specify a OneHotEncoder
with custom parameters. #1000 by Guillaume Lemaitre.
SMOTEN now accepts a parameter categorical_encoder
allowing to specify a OrdinalEncoder
with custom parameters. A new fitted parameter categorical_encoder_
is exposed to access the fitted encoder. #1001 by Guillaume Lemaitre.
RandomUnderSampler and RandomOverSampler (when shrinkage
is not None
) now accept any data types and will not attempt any data conversion. #1004 by Guillaume Lemaitre.
SMOTENC now support passing array-like of str
when passing the categorical_features
parameter. #1008 by :userGuillaume Lemaitre <glemaitre>
.
SMOTENC now support automatic categorical inference when categorical_features
is set to "auto"
. #1009 by :userGuillaume Lemaitre <glemaitre>
.
Published by glemaitre over 1 year ago
minority
was rejected as an unvalid sampling strategy. #964 by Prakhyath07.Published by glemaitre almost 2 years ago
python -OO
that replaces doc by None. #953 bu Guillaume Lemaitre.feature_names_in_
as well as get_feature_names_out
for all samplers. #959 by Guillaume Lemaitre.n_jobs
has been deprecated from the classes ADASYN, BorderlineSMOTE, SMOTE, SMOTENC, SMOTEN, and SVMSMOTE. Instead, pass a nearest neighbors estimator where n_jobs is set. #887 by Guillaume Lemaitre.base_estimator
is deprecated and will be removed in version 0.12. It is impacted the following classes: BalancedBaggingClassifier, EasyEnsembleClassifier, RUSBoostClassifier. #946 by Guillaume Lemaitre.Published by glemaitre over 2 years ago
Compatibility with scikit-learn 1.1.0
Published by glemaitre almost 3 years ago
Compatibility with scikit-learn 1.0.2
Published by glemaitre about 3 years ago
September 29, 2021
Make imbalanced-learn compatible with scikit-learn 1.0. #864 by Guillaume Lemaitre.
Published by glemaitre over 3 years ago
February 18, 2021
imblearn.metrics.macro_averaged_mean_absolute_error
returning the average across class of the MAE. This metric is used in ordinal classification. #780 by Aurélien Massiot.imblearn.metrics.pairwise.ValueDifferenceMetric
to compute pairwise distances between samples containing only categorical values. #796 by Guillaume Lemaitre.imblearn.over_sampling.SMOTEN
to over-sample data only containing categorical features. #802 by Guillaume Lemaitre.imblearn.ensemble.BalancedBaggingClassifier
unlocking the implementation of methods based on resampled bagging. #808 by Guillaume Lemaitre.output_dict
in imblearn.metrics.classification_report_imbalanced
to return a dictionary instead of a string. #770 by Guillaume Lemaitre.imblearn.under_sampling.ClusterCentroids
where voting="hard"
could have lead to select a sample from any class instead of the targeted class. #769 by Guillaume Lemaitre.imblearn.FunctionSampler
where validation was performed even with validate=False
when calling fit
. #790 by Guillaume Lemaitre.extras_require
within the setup.py
file. #816 by Guillaume Lemaitre.pydata-sphinx-theme
. #801 by Guillaume Lemaitre.imblearn.utils.testing.warns
is deprecated in 0.8 and will be removed 1.0. #815 by Guillaume Lemaitre.Published by glemaitre over 4 years ago
A release to bump the minimum version of scikit-learn to 0.23 with a couple of bug fixes.
Check the what's new for more information.
Published by glemaitre over 4 years ago
This is a bug-fix release to resolve some issues regarding the handling the input and the output format of the arrays.
Published by glemaitre almost 5 years ago
This is a bug-fix release to primarily resolve some packaging issues in version 0.6.0. It also includes minor documentation improvements and some bug fixes.
imblearn.ensemble.BalancedRandomForestClassifier
leading to a wrong number of samples used during fitting due max_samples and therefore a bad computation of the OOB score. :pr:656
by :user:Guillaume Lemaitre <glemaitre>
.Published by glemaitre almost 5 years ago
Changed models
..............
The following models might give some different sampling due to changes in
scikit-learn:
imblearn.under_sampling.ClusterCentroids
imblearn.under_sampling.InstanceHardnessThreshold
The following samplers will give different results due to change linked to
the random state internal usage:
imblearn.over_sampling.SMOTENC
Bug fixes
.........
:class:imblearn.under_sampling.InstanceHardnessThreshold
now take into
account the random_state
and will give deterministic results. In addition,
cross_val_predict
is used to take advantage of the parallelism.
:pr:599
by :user:Shihab Shahriar Khan <Shihab-Shahriar>
.
Fix a bug in :class:imblearn.ensemble.BalancedRandomForestClassifier
leading to a wrong computation of the OOB score.
:pr:656
by :user:Guillaume Lemaitre <glemaitre>
.
Maintenance
...........
Update imports from scikit-learn after that some modules have been privatize.
The following import have been changed:
:class:sklearn.ensemble._base._set_random_states
,
:class:sklearn.ensemble._forest._parallel_build_trees
,
:class:sklearn.metrics._classification._check_targets
,
:class:sklearn.metrics._classification._prf_divide
,
:class:sklearn.utils.Bunch
,
:class:sklearn.utils._safe_indexing
,
:class:sklearn.utils._testing.assert_allclose
,
:class:sklearn.utils._testing.assert_array_equal
,
:class:sklearn.utils._testing.SkipTest
.
:pr:617
by :user:Guillaume Lemaitre <glemaitre>
.
Synchronize :mod:imblearn.pipeline
with :mod:sklearn.pipeline
.
:pr:620
by :user:Guillaume Lemaitre <glemaitre>
.
Synchronize :class:imblearn.ensemble.BalancedRandomForestClassifier
and add
parameters max_samples
and ccp_alpha
.
:pr:621
by :user:Guillaume Lemaitre <glemaitre>
.
Enhancement
...........
:class:imblearn.under_sampling.RandomUnderSampling
,
:class:imblearn.over_sampling.RandomOverSampling
,
:class:imblearn.datasets.make_imbalance
accepts Pandas DataFrame in and
will output Pandas DataFrame. Similarly, it will accepts Pandas Series in and
will output Pandas Series.
:pr:636
by :user:Guillaume Lemaitre <glemaitre>
.
:class:imblearn.FunctionSampler
accepts a parameter validate
allowing
to check or not the input X
and y
.
:pr:637
by :user:Guillaume Lemaitre <glemaitre>
.
:class:imblearn.under_sampling.RandomUnderSampler
,
:class:imblearn.over_sampling.RandomOverSampler
can resample when non
finite values are present in X
.
:pr:643
by :user:Guillaume Lemaitre <glemaitre>
.
All samplers will output a Pandas DataFrame if a Pandas DataFrame was given
as an input.
:pr:644
by :user:Guillaume Lemaitre <glemaitre>
.
The samples generation in
:class:imblearn.over_sampling.SMOTE
,
:class:imblearn.over_sampling.BorderlineSMOTE
,
:class:imblearn.over_sampling.SVMSMOTE
,
:class:imblearn.over_sampling.KMeansSMOTE
,
:class:imblearn.over_sampling.SMOTENC
is now vectorize with giving
an additional speed-up when X
in sparse.
:pr:596
by :user:Matt Eding <MattEding>
.
Deprecation
...........
The following classes have been removed after 2 deprecation cycles:
ensemble.BalanceCascade
and ensemble.EasyEnsemble
.
:pr:617
by :user:Guillaume Lemaitre <glemaitre>
.
The following functions have been removed after 2 deprecation cycles:
utils.check_ratio
.
:pr:617
by :user:Guillaume Lemaitre <glemaitre>
.
The parameter ratio
and return_indices
has been removed from all
samplers.
:pr:617
by :user:Guillaume Lemaitre <glemaitre>
.
The parameters m_neighbors
, out_step
, kind
, svm_estimator
have been removed from the :class:imblearn.over_sampling.SMOTE
.
:pr:617
by :user:Guillaume Lemaitre <glemaitre>
.
Published by glemaitre over 5 years ago
The following models or function might give different results even if the
same data X
and y
are the same.
imblearn.ensemble.RUSBoostClassifier
default estimator changed fromsklearn.tree.DecisionTreeClassifier
with full depth to a decisionmax_depth=1
).Correct the definition of the ratio when using a float
in sampling
strategy for the over-sampling and under-sampling.
:issue:525
by :user:Ariel Rossanigo <arielrossanigo>
.
Add :class:imblearn.over_sampling.BorderlineSMOTE
and
:class:imblearn.over_sampling.SVMSMOTE
in the API documenation.
:issue:530
by :user:Guillaume Lemaitre <glemaitre>
.
Add Parallelisation for SMOTEENN and SMOTETomek.
:pr:547
by :user:Michael Hsieh <Microsheep>
.
Add :class:imblearn.utils._show_versions
. Updated the contribution guide
and issue template showing how to print system and dependency information
from the command line. :pr:557
by :user:Alexander L. Hayes <batflyer>
.
Add :class:imblearn.over_sampling.KMeansSMOTE
which is an over-sampler
clustering points before to apply SMOTE.
:pr:435
by :user:Stephan Heijl <StephanHeijl>
.
Make it possible to import imblearn
and access submodule.
:pr:500
by :user:Guillaume Lemaitre <glemaitre>
.
Remove support for Python 2, remove deprecation warning from
scikit-learn 0.21.
:pr:576
by :user:Guillaume Lemaitre <glemaitre>
.
Fix wrong usage of :class:keras.layers.BatchNormalization
in
porto_seguro_keras_under_sampling.py
example. The batch normalization
was moved before the activation function and the bias was removed from the
dense layer.
:pr:531
by :user:Guillaume Lemaitre <glemaitre>
.
Fix bug which converting to COO format sparse when stacking the matrices in
:class:imblearn.over_sampling.SMOTENC
. This bug was only old scipy version.
:pr:539
by :user:Guillaume Lemaitre <glemaitre>
.
Fix bug in :class:imblearn.pipeline.Pipeline
where None could be the final
estimator.
:pr:554
by :user:Oliver Rausch <orausch>
.
Fix bug in :class:imblearn.over_sampling.SVMSMOTE
and
:class:imblearn.over_sampling.BorderlineSMOTE
where the default parameter
of n_neighbors
was not set properly.
:pr:578
by :user:Guillaume Lemaitre <glemaitre>
.
Fix bug by changing the default depth in
:class:imblearn.ensemble.RUSBoostClassifier
to get a decision stump as a
weak learner as in the original paper.
:pr:545
by :user:Christos Aridas <chkoar>
.
Allow to import keras
directly from tensorflow
in the
:mod:imblearn.keras
.
:pr:531
by :user:Guillaume Lemaitre <glemaitre>
.
Published by glemaitre almost 6 years ago
Mainly bugfix in SMOTE NC
Published by glemaitre almost 6 years ago
Version 0.4.2
Bug fixes
Published by glemaitre about 6 years ago
October, 2018
Version 0.4 is the last version of imbalanced-learn to support Python 2.7
and Python 3.4. Imbalanced-learn 0.5 will require Python 3.5 or higher.
This release brings its set of new feature as well as some API changes to
strengthen the foundation of imbalanced-learn.
As new feature, 2 new modules imblearn.keras
and
imblearn.tensorflow
have been added in which imbalanced-learn samplers
can be used to generate balanced mini-batches.
The module imblearn.ensemble
has been consolidated with new classifier:
imblearn.ensemble.BalancedRandomForestClassifier
,
imblearn.ensemble.EasyEnsembleClassifier
,
imblearn.ensemble.RUSBoostClassifier
.
Support for string has been added in
imblearn.over_sampling.RandomOverSampler
and
imblearn.under_sampling.RandomUnderSampler
. In addition, a new class
imblearn.over_sampling.SMOTENC
allows to generate sample with data
sets containing both continuous and categorical features.
The imblearn.over_sampling.SMOTE
has been simplified and break down
to 2 additional classes:
imblearn.over_sampling.SVMSMOTE
and
imblearn.over_sampling.BorderlineSMOTE
.
There is also some changes regarding the API:
the parameter sampling_strategy
has been introduced to replace the
ratio
parameter. In addition, the return_indices
argument has been
deprecated and all samplers will exposed a sample_indices_
whenever this is
possible.
Published by glemaitre about 6 years ago
October, 2018
.. warning::
Version 0.4 is the last version of imbalanced-learn to support Python 2.7
and Python 3.4. Imbalanced-learn 0.5 will require Python 3.5 or higher.
This release brings its set of new feature as well as some API changes to
strengthen the foundation of imbalanced-learn.
As new feature, 2 new modules imblearn.keras
and
imblearn.tensorflow
have been added in which imbalanced-learn samplers
can be used to generate balanced mini-batches.
The module imblearn.ensemble
has been consolidated with new classifier:
imblearn.ensemble.BalancedRandomForestClassifier
,
imblearn.ensemble.EasyEnsembleClassifier
,
imblearn.ensemble.RUSBoostClassifier
.
Support for string has been added in
imblearn.over_sampling.RandomOverSampler
and
imblearn.under_sampling.RandomUnderSampler
. In addition, a new class
imblearn.over_sampling.SMOTENC
allows to generate sample with data
sets containing both continuous and categorical features.
The imblearn.over_sampling.SMOTE
has been simplified and break down
to 2 additional classes:
imblearn.over_sampling.SVMSMOTE
and
imblearn.over_sampling.BorderlineSMOTE
.
There is also some changes regarding the API:
the parameter sampling_strategy
has been introduced to replace the
ratio
parameter. In addition, the return_indices
argument has been
deprecated and all samplers will exposed a sample_indices_
whenever this is
possible.