
MILBoost and other boosting algorithms, compatible with scikit-learn

MIT License



Boosting Algorithms compatible with scikit-learn.

Boosting Algorithms

The skboost package contains implementations of some boosting algorithms that are outside the scope of scikit-learn.

The main point of interest is the MILBoost algorithm, which performs boosting with a Multiple Instance Learning formulation.


See [1], [2] and [4].


See [3].


See [3].


This repository includes a vendored copy of the MUSK datasets ([5]), both version 1 and version 2. These are used for multiple instance learning benchmarks:

This dataset describes a set of 92 molecules of which 47 are judged by human experts to be musks and the remaining 45 molecules are judged to be non-musks. The goal is to learn to predict whether new molecules will be musks or non-musks. However, the 166 features that describe these molecules depend upon the exact shape, or conformation, of the molecule. Because bonds can rotate, a single molecule can adopt many different shapes. To generate this data set, the low-energy conformations of the molecules were generated and then filtered to remove highly similar conformations. This left 476 conformations. Then, a feature vector was extracted that describes each conformation.

This many-to-one relationship between feature vectors and molecules is called the "multiple instance problem". When learning a classifier for this data, the classifier should classify a molecule as "musk" if ANY of its conformations is classified as a musk. A molecule should be classified as "non-musk" if NONE of its conformations is classified as a musk.


