optuml

Optuna-optimized ML methods, with scikit-learn like API

MIT License

Downloads
896
Stars
1

OptuML: Hyperparameter Optimization for Multiple Machine Learning Algorithms using Optuna

  ⣰⡁ ⡀⣀ ⢀⡀ ⣀⣀    ⡎⢱ ⣀⡀ ⣰⡀ ⡀⢀ ⡷⢾ ⡇    ⠄ ⣀⣀  ⣀⡀ ⢀⡀ ⡀⣀ ⣰⡀   ⡎⢱ ⣀⡀ ⣰⡀ ⠄ ⣀⣀  ⠄ ⣀⣀ ⢀⡀ ⡀⣀
  ⢸  ⠏  ⠣⠜ ⠇⠇⠇   ⠣⠜ ⡧⠜ ⠘⠤ ⠣⠼ ⠇⠸ ⠧⠤   ⠇ ⠇⠇⠇ ⡧⠜ ⠣⠜ ⠏  ⠘⠤   ⠣⠜ ⡧⠜ ⠘⠤ ⠇ ⠇⠇⠇ ⠇ ⠴⠥ ⠣⠭ ⠏ 

OptuML (for Optuna and ML) is a Python module that provides hyperparameter optimization for several machine learning algorithms using the Optuna framework. The module supports a variety of algorithms and allows easy hyperparameter tuning through a scikit-learn-like API.

Input                  OptuML train                           Predict                        
┌─────────────────┐    ┌──────────────────────────────────┐   ┌─────────────────────────────┐
│X_train, y_train ┼────► clf = Optimizer(algorithm="SVC") ├───► y_pred = clf.predict(X_test)│
└─────────────────┘    │ clf.fit(X_train, y_train)        │   │                             │
┌─────────────────┐    └─▲────────────────────────────────┘   └─────────────────────────▲───┘
│ML algorithm     ├──────┘                                                              │    
└─────────────────┘                                                            X_test───┘    

Features

  • Multiple Algorithms: Supports hyperparameter optimization for the following algorithms:
    • Scikit-learn zoo, plus:
    • CatBoost
    • XGBoost
  • Optuna Framework: Leverages Optuna for powerful hyperparameter search.
  • Maximize or Minimize: Allows setting the optimization direction (maximize or minimize).
  • Scikit-learn API: Provides a consistent interface for fit(), predict(), predict_proba(), and score() methods.
  • Control Output: Optionally run Optuna with granular verbosity settings (verbose as bool or int).
  • Cross-validation: Easily integrate cross-validation with custom scoring metrics (e.g., accuracy, ROC AUC).

Installation

a) pip

pip install optuml

or upgrade with:

pip install optuml --upgrade

b) Manual way

You can install the required packages via pip:

pip install optuna scikit-learn catboost xgboost numpy wrapt_timeout_decorator

Next just fetch the optuml.py file from the repo and put it in the directory with your script.

Usage

Basic Example

Here’s how you can use the Optimizer class to optimize hyperparameters for different machine learning algorithms using the Iris dataset:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from OptuML import Optimizer

# Load the Iris dataset
X, y = load_iris(return_X_y=True)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Instantiate the optimizer for SVC
optimizer = Optimizer(algorithm="SVC", n_trials=50, cv=3, scoring="accuracy", verbose=True)

# Fit the optimizer to the training data
optimizer.fit(X_train, y_train)

# Predict on the test set
y_pred = optimizer.predict(X_test)

# Calculate the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

# Print the best hyperparameters found during optimization
print(f"Best Hyperparameters: {optimizer.best_params_}")

Available Algorithms

The Optimizer class supports the following algorithms. You can specify the algorithm parameter to choose which one to use:

Classifiers
Algorithm Type
AdaBoostClassifier AdaBoost Classifier
CatBoostClassifier CatBoost Classifier
GaussianNB Gaussian Naive Bayes
KNeighborsClassifier k-Nearest Neighbors Classifier
MLPClassifier Multi-layer Perceptron Classifier
RandomForestClassifier Random Forest Classifier
SVC Support Vector Classifier
XGBClassifier XGBoost Classifier
QDA Quadratic Discriminant Analysis
Regressors
Algorithm Type
AdaBoostRegressor AdaBoost Regressor
CatBoostRegressor CatBoost Regressor
KNeighborsRegressor k-Nearest Neighbors Regressor
MLPRegressor Multi-layer Perceptron Regressor
RandomForestRegressor Random Forest Regressor
SVR Support Vector Regressor
XGBRegressor XGBoost Regressor

Controlling Verbosity

You can control the verbosity of Optuna's output by using the verbose parameter:

  • Set verbose=True for standard logging.
  • Use an int value to specify more granular verbosity levels (e.g., optuna.logging.DEBUG).
optimizer = Optimizer(algorithm="SVC", n_trials=50, cv=3, scoring="accuracy", verbose=True)
  • and/or show a progress bar:
optimizer = Optimizer(algorithm="SVC", n_trials=50, cv=3, scoring="accuracy", show_progress_bar=True)

API Reference

Optimizer

Parameters

  • algorithm (str): The machine learning algorithm to optimize.
  • direction (str, default "maximize"): Direction of optimization. Can be "maximize" or "minimize".
  • verbose (bool or int, default False): Controls Optuna's verbosity.
  • n_trials (int, default 100): Number of optimization trials to run.
  • timeout (float, optional): Maximum time (in seconds) for the optimization process.
  • cv (int, default 5): Number of cross-validation folds.
  • scoring (str, default "accuracy"): Scoring metric to use during cross-validation.
  • random_state (int, optional): Seed for random number generation.
  • cv_timeout (int, default 120) Timeout for a signle cv process within a trial

Methods

  • fit(X, y): Fit the model using hyperparameter optimization.
  • predict(X): Make predictions using the best model found during optimization.
  • predict_proba(X): Predict class probabilities (if supported by the model).
  • score(X, y): Score the model using the test data.
Package Rankings
Top 34.28% on Pypi.org
Badges
Extracted from project README's
Python manual install Python pip install pypi version
Related Projects