🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
APACHE-2.0 License
Bot releases are visible (Hide)
add
method by @hazrulakmal in https://github.com/huggingface/evaluate/pull/424
datasets
import in Meteor metric by @mariosasko in https://github.com/huggingface/evaluate/pull/490
Full Changelog: https://github.com/huggingface/evaluate/compare/v0.4.0...v0.4.1
Published by lvwerra almost 2 years ago
scikit-learn
install in spaces by @lvwerra in https://github.com/huggingface/evaluate/pull/345
Evaluate
usage for scikit-learn
by @awinml in https://github.com/huggingface/evaluate/pull/368
Full Changelog: https://github.com/huggingface/evaluate/compare/v0.3.0...v0.4.0
Published by lvwerra about 2 years ago
commit_hash
to args by @lvwerra in https://github.com/huggingface/evaluate/pull/253
handle_impossible_answer
from the default PIPELINE_KWARGS
in the question answering evaluator by @fxmarty in https://github.com/huggingface/evaluate/pull/272
split
and subset
kwarg into other evaluators by @mathemakitten in https://github.com/huggingface/evaluate/pull/301
HubEvaluationModuleFactory
by @lvwerra in https://github.com/huggingface/evaluate/pull/314
Full Changelog: https://github.com/huggingface/evaluate/compare/v0.2.2...v0.3.0
Published by lvwerra about 2 years ago
Full Changelog: https://github.com/huggingface/evaluate/compare/v0.2.1...v0.2.2
Published by lvwerra about 2 years ago
Full Changelog: https://github.com/huggingface/evaluate/compare/v0.2.0...v0.2.1
Published by lvwerra about 2 years ago
evaluator
The evaluator
has been extended to three new tasks:
"image-classification"
"token-classification"
"question-answering"
combine
With combine
one can bundle several metrics into a single object that can be evaluated in one call and also used in combination with the evalutor
.
evaluator
tests by @lvwerra in https://github.com/huggingface/evaluate/pull/155
input_texts
to predictions
in perplexity by @lvwerra in https://github.com/huggingface/evaluate/pull/157
combine
to compose multiple evaluations by @lvwerra in https://github.com/huggingface/evaluate/pull/150
TextClassificationEvaluator
test by @fxmarty in https://github.com/huggingface/evaluate/pull/172
ImageClassificationEvaluator
by @fxmarty in https://github.com/huggingface/evaluate/pull/173
TokenClassificationEvaluator
by @fxmarty in https://github.com/huggingface/evaluate/pull/167
Full Changelog: https://github.com/huggingface/evaluate/compare/v0.1.2...v0.2.0
Published by lvwerra over 2 years ago
Full Changelog: https://github.com/huggingface/evaluate/compare/v0.1.1...v0.1.2
Published by lvwerra over 2 years ago
pip install evaluate[evaluator]
by @philschmid in https://github.com/huggingface/evaluate/pull/103
evaluate
dependency in spaces by @lvwerra in https://github.com/huggingface/evaluate/pull/88
Full Changelog: https://github.com/huggingface/evaluate/compare/v0.1.0...v0.1.1
Published by lvwerra over 2 years ago
​
These are the release notes of the initial release of the Evaluate library.
​
​
Goals of the Evaluate library:
​
​
evaluate.load()
: The load()
function is the main entry point into evaluate and allows to load evaluation modules from a local folder, the evaluate repository, or the Hugging Face Hub. It downloads, caches, and loads the evaluation modules and returns an evaluate.EvaluationModule
.evaluate.save()
: With save()
a user can save evaluation results in a JSON file. In addition to the results from evaluate.EvaluationModule
it can save additional parameters and automatically saves the timestamp, git commit hash, library version as well as Python path. One can either provide a directory for the results, in which case file names are automatically created, or an explicit file name for the result.evaluate.push_to_hub()
: The push_to_hub
function allows to push the results of a model evaluation to the model card on the Hugging Face Hub. The model, dataset, and metric are specified such that they can be linked on the hub.evaluate.EvaluationModule
: The EvaluationModule
class is the baseclass for all evaluation modules. There are three module types: metrics (to evaluate models), comparisons (to compare models), and measurements (to analyze datasets). The inputs can be either added with add
(single input) and add_batch
(batch of inputs) followed by a final compute
call to compute the scores or all inputs can be passed to compute
directly. Under the hood, Apache Arrow stores and loads the input data to compute the scores.evaluate.EvaluationModuleInfo
: The EvaluationModule
class is used to store attributes:
description
: A short description of the evaluation module.citation
: A BibTex string for citation when available.features
: A Features
object defining the input format. The inputs provided to add
, add_batch
, and compute
are tested against these types and an error is thrown in case of a mismatch.inputs_description
: This is equivalent to the modules docstring.homepage
: The homepage of the module.license
: The license of the module.codebase_urls
: Link to the code behind the module.reference_urls
: Additional reference URLs.evaluate.evaluator
: The evaluator
provides automated evaluation and only requires a model, dataset, metric, in contrast to the metrics in the EvaluationModule
which require model predictions. It has three main components: a model wrapped in a pipeline, a dataset, and a metric, and it returns the computed evaluation scores. Besides the three main components, it may also require two mappings to align the columns in the dataset and the pipeline labels with the datasets labels. This is an experimental feature -- currently, only text classification is supported.evaluate-cli
: The community can add custom metrics by adding the necessary module script to a Space on the Hugging Face Hub. The evaluate-cli
is a tool that simplifies this process by creating the Space, populating a template, and pushing it to the Hub. It also provides instructions to customize the template and integrate custom logic.​
@lvwerra , @sashavor , @NimaBoscarino , @ola13 , @osanseviero , @lhoestq , @lewtun , @douwekiela