Open Source Ecosystems

Overview

Shapash is a Python library designed to make machine learning interpretable and comprehensible for everyone. It offers various visualizations with clear and explicit labels that are easily understood by all.

With Shapash, you can generate a Webapp that simplifies the comprehension of interactions between the model's features, and allows seamless navigation between local and global explainability. This Webapp enables Data Scientists to effortlessly understand their models and share their results with both data scientists and non-data experts.

Additionally, Shapash contributes to data science auditing by presenting valuable information about any model and data in a comprehensive report.

Shapash is suitable for Regression, Binary Classification and Multiclass problems. It is compatible with numerous models, including Catboost, Xgboost, LightGBM, Sklearn Ensemble, Linear models, and SVM. For other models, solutions to integrate Shapash are available; more details can be found here.

[!NOTE] If you want to give us feedback : Feedback form

Shapash App Demo

Documentation and resources

What's new ?

Version	New Feature	Description
2.3.x	Additional dataset columns New demo Article	In Webapp: Target and error columns added to dataset and possibility to add features outside the model for more filtering options
2.3.x	Identity card New demo Article	In Webapp: New identity card to summarize the information of the selected sample
2.2.x	Picking samples Article	New tab in the webapp for picking samples. The graph represents the "True Values Vs Predicted Values"
2.2.x	Dataset Filter	New tab in the webapp to filter data. And several improvements in the webapp: subtitles, labels, screen adjustments
2.0.x	Refactoring Shapash	Refactoring attributes of compile methods and init. Refactoring implementation for new backends
1.7.x	Variabilize Colors	Giving possibility to have your own colour palette for outputs adapted to your design
1.6.x	Explainability Quality Metrics Article	To help increase confidence in explainability methods, you can evaluate the relevance of your explainability using 3 metrics: Stability, Consistency and Compacity
1.4.x	Groups of features Demo	You can now regroup features that share common properties together. This option can be useful if your model has a lot of features.
1.3.x	Shapash Report Demo	A standalone HTML report that constitutes a basis of an audit document.

Features

Display clear and understandable results: plots and outputs use explicit labels for each feature and its values

Allow Data Scientists to quickly understand their models using a webapp to easily navigate between global and local explainability, and understand how the different features contribute: Live Demo Shapash-Monitor
Summarize and export local explanation

Shapash provides concise and clear local explanations, It allows each user, enabling users of any Data background to understand a local prediction of a supervised model through a summarized and explicit explanation

Evaluate the quality of your explainability with various metrics
Effortlessly share and discuss results with non-Data users
Select subsets for in-depth analysis of explainability by filtering based on explanatory and additional features, as well as correct or wrong predictions. Picking Examples to Understand Machine Learning Model
Deploy interpretability part of your project: From model training to deployment (API or Batch Mode)
Contribute to the auditability of your model by generating a standalone HTML report of your projects. Report Example

We believe that this report will offer valuable support for auditing models and data, leading to improved AI governance. Data Scientists can now provide anyone interested in their project with a document that captures various aspects of their work as the foundation for an audit report. This document can be easily shared among teams (internal audit, DPO, risk, compliance...).

How Shapash works

Shapash is an overlay package for libraries focused on model interpretability. It uses Shap or Lime backend to compute contributions. Shapash builds upon the various steps required to create a machine learning model, making the results more understandable.

Shapash is suitable for Regression, Binary Classification or Multiclass problem. It is compatible with numerous models: Catboost, Xgboost, LightGBM, Sklearn Ensemble, Linear models, SVM.

If your model is not in the list of compatible models, it is possible to provide Shapash with local contributions calculated with shap or another method. Here's an example of how to provide contributions to Shapash. An issue has been created to enhance this use case.

Shapash can use category-encoders object, sklearn ColumnTransformer or simply features dictionary.

Category_encoder: OneHotEncoder, OrdinalEncoder, BaseNEncoder, BinaryEncoder, TargetEncoder
Sklearn ColumnTransformer: OneHotEncoder, OrdinalEncoder, StandardScaler, QuantileTransformer, PowerTransformer

Installation

Shapash is intended to work with Python versions 3.9 to 3.12. Installation can be done with pip:

pip install shapash

In order to generate the Shapash Report some extra requirements are needed. You can install these using the following command :

pip install shapash[report]

If you encounter compatibility issues you may check the corresponding section in the Shapash documentation here.

Quickstart

The 4 steps to display results:

Step 1: Declare SmartExplainer Object

There 1 mandatory parameter in compile method: Model You can declare features dict here to specify the labels to display

from shapash import SmartExplainer

xpl = SmartExplainer(
    model=regressor,
    features_dict=house_dict,  # Optional parameter
    preprocessing=encoder,  # Optional: compile step can use inverse_transform method
    postprocessing=postprocess,  # Optional: see tutorial postprocessing
)

Step 2: Compile Dataset, ...

There 1 mandatory parameter in compile method: Dataset

xpl.compile(
    x=xtest,
    y_pred=y_pred,  # Optional: for your own prediction (by default: model.predict)
    y_target=yTest,  # Optional: allows to display True Values vs Predicted Values
    additional_data=xadditional,  # Optional: additional dataset of features for Webapp
    additional_features_dict=features_dict_additional,  # Optional: dict additional data
)

Step 3: Display output

There are several outputs and plots available. for example, you can launch the web app:

app = xpl.run_app()

Live Demo Shapash-Monitor

Step 4: Generate the Shapash Report

This step allows to generate a standalone html report of your project using the different splits of your dataset and also the metrics you used:

xpl.generate_report(
    output_file="path/to/output/report.html",
    project_info_file="path/to/project_info.yml",
    x_train=xtrain,
    y_train=ytrain,
    y_test=ytest,
    title_story="House prices report",
    title_description="""This document is a data science report of the kaggle house prices tutorial project.
        It was generated using the Shapash library.""",
    metrics=[{"name": "MSE", "path": "sklearn.metrics.mean_squared_error"}],
)

Report Example

Step 5: From training to deployment : SmartPredictor Object

Shapash provides a SmartPredictor object to deploy the summary of local explanation for the operational needs. It is an object dedicated to deployment, lighter than SmartExplainer with additional consistency checks. SmartPredictor can be used with an API or in batch mode. It provides predictions, detailed or summarized local explainability using appropriate wording.

predictor = xpl.to_smartpredictor()

See the tutorial part to know how to use the SmartPredictor object

Tutorials

This github repository offers many tutorials to allow you to easily get started with Shapash.

Using postprocessing parameter in compile method

Building confidence on explainability methods using Stability, Consistency and Compacity metrics

Generate a standalone HTML report of your project with generate_report

Add features outside of the model for more exploration options

Contributors

Awards

Package Rankings

Top 8.17% on Proxy.golang.org

Top 2.88% on Pypi.org

Badges

Extracted from project README

Related Projects

explainerdashboard

Quickly build Explainable AI dashboards that show the inner workings of so-called "blackbox" mach...

30 Oct 2019 2,227

Data-science

Collection of useful data science topics along with articles, videos, and code

17 Jul 2020 4,031

Tutorial

Tutorials on machine learning, artificial intelligence in general and in biomedical research.

15 Dec 2022 8

ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

09 Jan 2016 12,108

nyoka

Nyoka is a Python library that helps to export ML models into PMML (PMML 4.4.1 Standard).

23 Aug 2018 184

tfm-iabd

Master's Final Degree Project on Artificial Intelligence and Big Data

22 Mar 2022 5

refinery-python-sdk

Official Python SDK for Kern AI refinery.

06 Jun 2021 18

fine-grained-sentiment

A comparison and discussion of different NLP methods for 5-class sentiment classification on the ...

04 Jun 2019 165

sweetviz

Visualize and compare datasets, target values and associations, with one line of code.

09 May 2020 2,853

HubMap

Kaggle HubMap

22 Nov 2020 36

CO2Oracle

The CO2 Oracle project uses machine learning and AI to analyze and predict CO2 emissions for envi...

02 Aug 2024 1

OPEN-ARC

Open-source Platform for Engineering Neural Architectures and Research Collaboration. Developing ...

17 Jun 2024 1

forestplot

A Python package to make publication-ready but customizable coefficient plots.

03 Jul 2022 110