automated-explanations

Generating and validating natural-language explanations.

MIT License

Stars
36
Committers
4

This repo contains code to reproduce the experiments in the GEM-V paper and the SASC paper. SASC takes in a text module and produces a natural explanation for it that describes what it types of inputs elicit the largest response from the module (see Fig below). GEM-V tests tests this in detail in an fMRI setting.

SASC is similar to the nice concurrent paper by OpenAI, but simplifies explanations to describe the function rather than produce token-level activations. This makes it simpler/faster, and makes it more effective at describing semantic functions from limited data (e.g. fMRI voxels) but worse at finding patterns that depend on sequences / ordering.

For a simple scikit-learn interface to use SASC, use the imodelsX library. Install with pip install imodelsx then the below shows a quickstart example.

from imodelsx import explain_module_sasc
# a toy module that responds to the length of a string
mod = lambda str_list: np.array([len(s) for s in str_list])

# a toy dataset where the longest strings are animals
text_str_list = ["red", "blue", "x", "1", "2", "hippopotamus", "elephant", "rhinoceros"]
explanation_dict = explain_module_sasc(
    text_str_list,
    mod,
    ngrams=1,
)

Reference

@misc{antonello2024generativeframeworkbridgedatadriven,
      title={A generative framework to bridge data-driven models and scientific theories in language neuroscience}, 
      author={Richard Antonello and Chandan Singh and Shailee Jain and Aliyah Hsu and Jianfeng Gao and Bin Yu and Alexander Huth},
      year={2024},
      eprint={2410.00812},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.00812}, 
}

@misc{singh2023explaining,
      title={Explaining black box text modules in natural language with language models}, 
      author={Chandan Singh and Aliyah R. Hsu and Richard Antonello and Shailee Jain and Alexander G. Huth and Bin Yu and Jianfeng Gao},
      year={2023},
      eprint={2305.09863},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}
Related Projects