Model interpretability and understanding for PyTorch
BSD-3-CLAUSE License
Bot releases are visible (Hide)
The Captum 0.7.0 release adds new functionalities for language model attribution, dataset level attribution, and a few improvements and bug fixes for existing methods.
Captum 0.7.0 adds new APIs for language model attribution, making it substantially easier to define interpretable text features with corresponding baselines and masks. These new wrappers are compatible with most attribution methods in Captum and make it substantially easier to understand how aspects of a prompt impact an LLM’s predicted response. More details can also be found in our paper:
Using Captum to Explain Generative Language Models
Example:
from captum.attr import ShapleyValueSampling, LLMAttribution, TextTemplateInput
shapley_values = ShapleyValueSampling(model)
llm_attr = LLMAttribution(shapley_values, tokenizer)
inp = TextTemplateInput(
# the text template
"{} lives in {}, {} and is a {}. {} personal interests include",
# the values of the features
["Dave", "Palm Coast", "FL", "lawyer", "His"],
# the reference baseline values of the features
baselines=["Sarah", "Seattle", "WA", "doctor", "Her"],
)
res = llm_attr.attribute(inp)
DataLoader Attribution is a new wrapper which provides an easy-to-use approach for obtaining attribution on a full dataset by providing a data loader rather than a single input (PR #1155, #1158).
Captum 0.7.0 has added a few improvements to existing attribution methods including:
Published by aobo-y almost 2 years ago
The Captum v0.6.0 release introduces a new feature StochasticGates
. This release also enhances Influential Examples and includes a series of other improvements & bug fixes.
Stochastic Gates is a technique to enforce sparsity by approximating L0 regularization. It can be used for network pruning and feature selection. As directly optimizing L0 is a non-differentiable combinatorial problem, Stochastic Gates approximates it by using certain continuous probability distributions (e.g., Concrete, Gaussian) as smoothed Bernoulli distributions. So the optimization can be reparameterized into the distributions parameters. Check the following papers for more details:
Captum provides two Stochastic Gates implementations using different distributions as smoothed Bernoulli, BinaryConcreteStochasticGates
and GaussianStochasticGates
. They are available under captum.module
, a new subpackage collecting neural network building blocks that are useful for model understanding. A usage example:
from captum.module import GaussianStochasticGates
n_gates = 5 # number of gates
stg = GaussianStochasticGates(n_gates, reg_weight=0.01)
inputs = torch.randn(3, n_gates) # mock inputs with batch size of 3
gated_inputs, reg = stg(mock_inputs) # gate the inputs
loss = model(gated_inputs) # use gated inputs in the downstream network
# optimize sparsity regularization together with the model loss
loss += reg
...
# verify the learned gate values to see how model is using the inputs
print(stg.get_gate_values())
Influential Examples is a new function pillar enabled in the last version. This new release continues to focus on it and introduces many improvements upon the existing TracInCP
family. Some of the changes are incompatible with the previous version. Below is the list of details:
mean
in TracInCPFast
and TracInCPFastRandProj
(https://github.com/pytorch/captum/pull/913)TracInCP
classes add a new argument show_progress
to optionally display progress bars for the compuation (https://github.com/pytorch/captum/pull/898, https://github.com/pytorch/captum/pull/1046)TracInCP
provides a new public method self_influence
which computes the self influence scores among the examples in the given data. influence
can no longer compute self_influence scores and the argument inputs
cannot be None
(https://github.com/pytorch/captum/pull/994, https://github.com/pytorch/captum/pull/1069, https://github.com/pytorch/captum/pull/1087, https://github.com/pytorch/captum/pull/1072)influence_src_dataset
in TracInCP
is renamed to train_dataset
(https://github.com/pytorch/captum/pull/994)TracInCPFast
and TracInCPFastRandProj
(https://github.com/pytorch/captum/pull/969)TracInCP
and TracInCPFastRandProj
provides a new public method compute_intermediate_quantities
which computes “embedding” vectors for examples in a the given data (https://github.com/pytorch/captum/pull/1068)TracInCP
classes supports a new optional argument test_loss_fn
for use cases where different losses are used for training and testing examples (https://github.com/pytorch/captum/pull/1073)influence
. Removed the arguments unpack_inputs
and target
. Now, the inputs
argument must be a tuple
where the last element is the label (https://github.com/pytorch/captum/pull/1072)TCAV
’s output (https://github.com/pytorch/captum/pull/915, https://github.com/pytorch/captum/issues/909)Lime
(https://github.com/pytorch/captum/pull/938, https://github.com/pytorch/captum/issues/910)captum
module, so users can import captum
and access everything underneath it, e.g., captum.attr
(https://github.com/pytorch/captum/pull/912, https://github.com/pytorch/captum/pull/992, https://github.com/pytorch/captum/issues/680)FeatureAblation
and FeaturePermutation
to verify the output type of forward_func
and its shape when perturbation_per_eval > 1
(https://github.com/pytorch/captum/pull/1047, https://github.com/pytorch/captum/pull/1049, https://github.com/pytorch/captum/pull/1091)tensor
or tuple[tensor]
(https://github.com/pytorch/captum/pull/1083)forward_hook
from module backward_hook
for many attribution algorithms that need tensor gradients, like DeepLift
and LayerLRP
. So those modules can now support models with in-place modules (https://github.com/pytorch/captum/pull/979, https://github.com/pytorch/captum/issues/914)mask
argument to FGSM
and PGD
adversarial attacks under captum.robust
to specify which elements are perturbed (https://github.com/pytorch/captum/pull/1043)Published by aobo-y over 2 years ago
The Captum v0.5.0 release introduces a new function pillar, Influential Examples, with a few code improvements and bug fixes.
Influential Examples implements the method TracInCP. It calculates the influence score of a given training example on a given test example, which approximately answers the question “if the given training example were removed from the training data, how much would the loss on the model change?”. TracInCP can be used for:
Captum currently offers the following specific variant implementings of TracInCP:
TracInCP
- Computes influence scores using gradients at all specified layers. Can be used for identifying proponents/opponents, and identifying mis-labelled data. Both computations take time linear in training data size.TracInCPFast
- Like TracInCP, but computes influence scores using only gradients in the last fully-connected layer, and is expedited using a computational trick.TracInCPFastRandProj
- Version of TracInCPFast which is specialized for computing proponents/opponents. In particular, pre-processing enables computation of proponents / opponents in constant time. The tradeoff is the linear time and memory required for pre-processing. Random projections can be used to reduce memory usage. This class should not be used for identifying mis-labelled data.A tutorial is made to demonstrate the usage https://captum.ai/tutorials/TracInCP_Tutorial
model_id
in TCAV
and removed AV
from public concept module (PR #811)attribute_to_layer_input
in TCAV
to set for both layer activation and attribution (#864)raw_input
to raw_input_ids
in visualization util VisualizationDataRecord
(PR #804)eps
argument in DeepLift
(PR #835)register_full_backward_hook
introduced in PyTorch v1.8.0. Attribution to neuron output in NeuronDeepLift
, NeuronGuidedBackprop
, and NeuronDeconvolution
are deprecated and will be removed in the next major release v0.6.0 (PR #837)tensor([[],[],[]])
(PR #812)visualization_transform
of ImageFeature
in Captum Insight is not applied (PR #871)Published by aobo-y almost 3 years ago
The Captum v0.4.1 release includes three new tutorials, a few code improvements and bug fixes.
Robustness tutorial:
Concept tutorials:
Numpy
across the codebase by replacing such usages with PyTorch
equivalents when possible (PR #714 #755 #760)ufmt
from previous black
+ isort
and reformatted the code accordingly (PR #739)captum._utils.av
for TCAV to use and refactored TCAV to simplify the creation of datasets used to train concept models (PR #747)save_div
’s argument default_value
to default_denom
and unified its behaviors for different denominator types (Issue #654 , PR #751)Published by vivekmig over 3 years ago
The Captum 0.4.0 release adds new functionalities for concept-based interpretability, evaluating model robustness, new attribution methods including Layerwise Relevance Propagation (LRP), and improvements to existing attribution methods.
Captum 0.4.0 adds TCAV (Testing with Concept Activation Vectors) to Captum, allowing users to identify significance of user-defined concepts on a model’s prediction. TCAV has been implemented in a generic manner, allowing users to define custom concepts with example inputs for any modality including vision and text.
Captum 0.4.0 also includes new tools to understand model robustness including implementations of adversarial attacks (Fast Gradient Sign Method and Projected Gradient Descent) as well as robustness metrics to evaluate the impact of different attacks or perturbations on a model. Robustness metrics included in this release include:
This robustness tooling enables model developers to better understand potential model vulnerabilities as well as analyze counterfactual examples to better understand a model’s decision boundary.
We also add a new attribution method LRP (Layerwise Relevance Propagation) to Captum in the 0.4.0 release, as well as a layer attribution variant, Layer LRP. Layer-wise relevance propagation is based on a backward propagation mechanism applied sequentially to all layers of the model. The model output score represents the initial relevance which is decomposed into values for each neuron of the underlying layers. Thanks to @nanohanno for contributing this method to Captum and @rGure for providing feedback!
We have added new tutorials to demonstrate Captum with BERT, usage of Lime, and DLRM recommender models. These tutorials are:
Additionally, the following fixes and updates to existing tutorials have been added:
Captum 0.4.0 has added improvements to existing attribution methods including:
Published by vivekmig over 3 years ago
Captum v0.3.1 includes some improvements and minor fixes beyond the functionalities added in Captum v0.3.0.
Captum v0.3.1 has added improvements to existing attribution methods including:
Published by vivekmig almost 4 years ago
The third release, v0.3.0, of Captum adds new attribution algorithms including Lime and KernelSHAP, metrics for assessing attribution results including infidelity and sensitivity, and improvements to existing attribution methods.
Captum 0.3.0 adds metrics to estimate the trustworthiness of model explanations. Currently available metrics include Sensitivity-Max and Infidelity.
Infidelity measures the mean squared error between model explanations in the magnitudes of input perturbations and predictor function's changes to those input perturbations. Sensitivity measures the degree of explanation changes to subtle input perturbations using Monte Carlo sampling-based approximation. These metrics are available in captum.metrics and documentation can be found here.
In Captum 0.3.0, we also add surrogate-model interpretability methods including Lime and KernelSHAP. Lime is an interpretability method that trains an interpretable surrogate model by sampling points around a specified input example and using model evaluations at these points to train a simpler interpretable 'surrogate' model, such as a linear model.
We offer two implementation variants of this method, LimeBase and Lime. LimeBase provides a generic framework to train a surrogate interpretable model, while Lime provides a more specific implementation than LimeBase in order to expose a consistent API with other perturbation-based algorithms. KernelSHAP is a method that uses the Lime framework to compute Shapley Values.
We have added new tutorials to demonstrate Captum with CV tasks such as segmentation as well as in distributed environments. These tutorials are:
Captum 0.3.0 has added improvements to existing attribution methods including:
Captum Insights
Published by vivekmig over 4 years ago
The second release, v0.2.0, of Captum adds a variety of new attribution algorithms as well as additional tutorials, type hints, and Google Colab support for Captum Insights.
The following new attribution algorithms are provided, which can be applied to any type of PyTorch model, including DataParallel models. While the first release focused primarily on gradient-based attribution methods such as Integrated Gradients, the new algorithms include perturbation-based methods, marked by ^ below. We also add new attribution methods designed primarily for convolution networks, denoted by * below. All attribution methods share a consistent API structure to make it easy to switch between attribution methods.
Attribution of model output with respect to the input features
1. Guided Backprop *
2. Deconvolution *
3. Guided GradCAM *
4. Feature Ablation ^
5. Feature Permutation ^
6. Occlusion ^
7. Shapley Value Sampling ^
Attribution of model output with respect to the layers of the model
1. Layer GradCAM
2. Layer Integrated Gradients
3. Layer DeepLIFT
4. Layer DeepLIFT SHAP
5. Layer Gradient SHAP
6. Layer Feature Ablation ^
Attribution of neurons with respect to the input features
1. Neuron DeepLIFT
2. Neuron DeepLIFT SHAP
3. Neuron Gradient SHAP
4. Neuron Guided Backprop *
5. Neuron Deconvolution *
6. Neuron Feature Ablation ^
^ Denotes Perturbation-Based Algorithm. These methods compute attribution by evaluating the model on perturbed versions of the input as opposed to using gradient information.
* Denotes attribution method designed primarily for convolutional networks.
We have added new tutorials to demonstrate Captum on BERT models, regression cases, and using perturbation-based methods. These tutorials include:
The Captum code base is now fully typed with Python type hints and type checked using mypy. Users can now accurately type-check code using Captum.
attribute_to_layer_input
and attribute_to_neuron_input
flags.Captum Insights
Published by orionr about 5 years ago
We just released our first version of the PyTorch Captum library for model interpretability!
This first release, v0.1.0, supports a number of gradient-based attribution algorithms as well as Captum Insights, a visualization tool for model debugging and understanding.
The following general purpose gradient-based attribution algorithms are provided. These can be applied to any type of PyTorch model and input features, including image, text, and multimodal.
Attribution of output of the model with respect to the input features
Attribution of output of the model with respect to the layers of the model
Attribution of neurons with respect to the input features
Attribution Algorithm + noisy sampling
Since some of the algorithms, like integrated gradients, expand input tensors internally, we want to make sure we can scale those tensors and our forward/backward computations efficiently. For that reason, we developed a feature that chunks tensors internally into internal_batch_size
pieces, an argument which can be passed as input to attribute
methods, which will make the library run forward and backward passes for each tensor batch separately and ultimately combine those after computing gradients.
The algorithms that support batched optimization are:
PyTorch data parallel models are also supported across all Captum algorithms, allowing users to take advantage of multiple GPUs when applying interpretability algorithms.
More details on these algorithms can be found on our website at captum.ai/docs/algorithms
Captum Insights provides these algorithms in an interactive Jupyter notebook-based tool for model debugging and understanding. It can be used embedded within a notebook or run as a standalone application.
Features:
Insights is built with standard web technologies including JavaScript, CSS, React, Yarn and Flask.