🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
APACHE-2.0 License
Tip: For more recent evaluation approaches, for example for evaluating LLMs, we recommend our newer and more actively maintained library LightEval.
🤗 Evaluate is a library that makes evaluating and comparing models and reporting their performance easier and more standardized.
It currently contains:
accuracy = load("accuracy")
, get any of these metrics ready to use for evaluating a ML model in any framework (Numpy/Pandas/PyTorch/TensorFlow/JAX).evaluate-cli create [metric name]
, which allows you to see easily compare different metrics and their outputs for the same sets of references and predictions.🔎 Find a metric, comparison, measurement on the Hub
🌟 Add a new evaluation module
🤗 Evaluate also has lots of useful features like:
🤗 Evaluate can be installed from PyPi and has to be installed in a virtual environment (venv or conda for instance)
pip install evaluate
🤗 Evaluate's main methods are:
evaluate.list_evaluation_modules()
to list the available metrics, comparisons and measurementsevaluate.load(module_name, **kwargs)
to instantiate an evaluation moduleresults = module.compute(*kwargs)
to compute the result of an evaluation moduleFirst install the necessary dependencies to create a new metric with the following command:
pip install evaluate[template]
Then you can get started with the following command which will create a new folder for your metric and display the necessary steps:
evaluate-cli create "Awesome Metric"
See this step-by-step guide in the documentation for detailed instructions.
Thanks to @marella for letting us use the evaluate
namespace on PyPi previously used by his library.