Bot releases are visible (Hide)

evaluate - v0.4.1 Latest Release

Published by lhoestq about 1 year ago

What's Changed

Add code example to docstrings by @stevhliu in https://github.com/huggingface/evaluate/pull/374
[Minor fix] Typo by @cakiki in https://github.com/huggingface/evaluate/pull/403
[Docs] fixed a typo in bertscore readme by @hazrulakmal in https://github.com/huggingface/evaluate/pull/386
Add max_length kwarg to docstring of Perplexity measurement by @kdutia in https://github.com/huggingface/evaluate/pull/411
Fix minor typo in a_quick_tour.mdx by @tupini07 in https://github.com/huggingface/evaluate/pull/417
Fix Docs base_evaluator.mdx by @jorahn in https://github.com/huggingface/evaluate/pull/418
Update Gradio description to clarify text-based input by @BramVanroy in https://github.com/huggingface/evaluate/pull/427
fix add method by @hazrulakmal in https://github.com/huggingface/evaluate/pull/424
Fix broken link in docs/a_quick_tour.mdx by @tupini07 in https://github.com/huggingface/evaluate/pull/419
resolve #379 audio classification evaluator + docs by @Plutone11011 in https://github.com/huggingface/evaluate/pull/405
fixed kwargs not being passed in combine by @Plutone11011 in https://github.com/huggingface/evaluate/pull/425
add r^2 metric by @TKaanKoc in https://github.com/huggingface/evaluate/pull/407
Update spaces gradio version to 3.19.1 by @BramVanroy in https://github.com/huggingface/evaluate/pull/426
replace evaluate DownloadConfig with datasets by @lvwerra in https://github.com/huggingface/evaluate/pull/447
Render Text2TextGenerationEvaluators' docstring examples by @mariosasko in https://github.com/huggingface/evaluate/pull/463
Trigger CI on ci-* branches by @Wauplin in https://github.com/huggingface/evaluate/pull/467
Update comet by @ricardorei in https://github.com/huggingface/evaluate/pull/443
Fix datasets import in Meteor metric by @mariosasko in https://github.com/huggingface/evaluate/pull/490
fix scikit-learn package name suggestion by @bzz in https://github.com/huggingface/evaluate/pull/498
Release: 0.4.1 by @lhoestq in https://github.com/huggingface/evaluate/pull/505

New Contributors

@cakiki made their first contribution in https://github.com/huggingface/evaluate/pull/403
@hazrulakmal made their first contribution in https://github.com/huggingface/evaluate/pull/386
@kdutia made their first contribution in https://github.com/huggingface/evaluate/pull/411
@tupini07 made their first contribution in https://github.com/huggingface/evaluate/pull/417
@jorahn made their first contribution in https://github.com/huggingface/evaluate/pull/418
@Plutone11011 made their first contribution in https://github.com/huggingface/evaluate/pull/405
@TKaanKoc made their first contribution in https://github.com/huggingface/evaluate/pull/407
@mariosasko made their first contribution in https://github.com/huggingface/evaluate/pull/463
@Wauplin made their first contribution in https://github.com/huggingface/evaluate/pull/467
@ricardorei made their first contribution in https://github.com/huggingface/evaluate/pull/443
@bzz made their first contribution in https://github.com/huggingface/evaluate/pull/498
@lhoestq made their first contribution in https://github.com/huggingface/evaluate/pull/505

Full Changelog: https://github.com/huggingface/evaluate/compare/v0.4.0...v0.4.1

evaluate - v0.4.0

Published by lvwerra almost 2 years ago

What's Changed

add trainer integration docs by @lvwerra in https://github.com/huggingface/evaluate/pull/325
Stop using model-defined truncation in perplexity calculation by @mathemakitten in https://github.com/huggingface/evaluate/pull/333
Don't use eval for Evaluator instances in the doc by @fxmarty in https://github.com/huggingface/evaluate/pull/341
fix caching by @lvwerra in https://github.com/huggingface/evaluate/pull/336
Fix #327 set default row of gradio webui to 1 and drop empty/blank row by @Raibows in https://github.com/huggingface/evaluate/pull/335
Update pr docs actions by @mishig25 in https://github.com/huggingface/evaluate/pull/344
Fix scikit-learn install in spaces by @lvwerra in https://github.com/huggingface/evaluate/pull/345
added MASE, sMAPE and MAPE metrics by @kashif in https://github.com/huggingface/evaluate/pull/330
fix sklearn dependency in mape, mase and smape by @lvwerra in https://github.com/huggingface/evaluate/pull/346
Update link text by @stevhliu in https://github.com/huggingface/evaluate/pull/360
Corrected range of MAE by @clefourrier in https://github.com/huggingface/evaluate/pull/359
Revert "Update pr docs actions" by @mishig25 in https://github.com/huggingface/evaluate/pull/363
Evaluation suite by @mathemakitten in https://github.com/huggingface/evaluate/pull/337
Matthews correlation coefficient by @sanderland in https://github.com/huggingface/evaluate/pull/362
fix tf version by @lvwerra in https://github.com/huggingface/evaluate/pull/372
Add TextGeneration Evaluator by @NimaBoscarino in https://github.com/huggingface/evaluate/pull/350
Fix typo in rouge types by @davebulaval in https://github.com/huggingface/evaluate/pull/364
Add Evaluate usage for scikit-learn by @awinml in https://github.com/huggingface/evaluate/pull/368
Adding metric visualization by @sashavor in https://github.com/huggingface/evaluate/pull/342
Add NIST metric by @BramVanroy in https://github.com/huggingface/evaluate/pull/250
add GitHub Actions CI by @lvwerra in https://github.com/huggingface/evaluate/pull/375
Add Evaluate Usage for Keras and Tensorflow by @arjunpatel7 in https://github.com/huggingface/evaluate/pull/370
fix version by @lvwerra in https://github.com/huggingface/evaluate/pull/380
CharacTER: MT metric by @BramVanroy in https://github.com/huggingface/evaluate/pull/286
CharCut: another character-based MT evaluation metric by @BramVanroy in https://github.com/huggingface/evaluate/pull/290
asr model evaluator addition + doc by @bayartsogt-ya in https://github.com/huggingface/evaluate/pull/378
Docs for EvaluationSuite by @mathemakitten in https://github.com/huggingface/evaluate/pull/340
Update the documentation of Mauve by @krishnap25 in https://github.com/huggingface/evaluate/pull/377
fix-ci-badge by @lvwerra in https://github.com/huggingface/evaluate/pull/385

New Contributors

@Raibows made their first contribution in https://github.com/huggingface/evaluate/pull/335
@kashif made their first contribution in https://github.com/huggingface/evaluate/pull/330
@clefourrier made their first contribution in https://github.com/huggingface/evaluate/pull/359
@davebulaval made their first contribution in https://github.com/huggingface/evaluate/pull/364
@awinml made their first contribution in https://github.com/huggingface/evaluate/pull/368
@arjunpatel7 made their first contribution in https://github.com/huggingface/evaluate/pull/370
@bayartsogt-ya made their first contribution in https://github.com/huggingface/evaluate/pull/378
@krishnap25 made their first contribution in https://github.com/huggingface/evaluate/pull/377

Full Changelog: https://github.com/huggingface/evaluate/compare/v0.3.0...v0.4.0

evaluate - v0.3.0

Published by lvwerra about 2 years ago

What's Changed

add multilabel f1 eval usage by @fcakyon in https://github.com/huggingface/evaluate/pull/221
Force get_supported_tasks() to return a list instead of dict keys by @mathemakitten in https://github.com/huggingface/evaluate/pull/227
Unpin rouge_score by @albertvillanova in https://github.com/huggingface/evaluate/pull/220
Remove import statement in Measurement Card by @meg-huggingface in https://github.com/huggingface/evaluate/pull/231
make rouge support multi-ref by @lvwerra in https://github.com/huggingface/evaluate/pull/229
Fix enforce string by @lvwerra in https://github.com/huggingface/evaluate/pull/230
Fix examples in perplexity measurement docs by @mathemakitten in https://github.com/huggingface/evaluate/pull/238
Add Wilcoxon's signed rank test by @douwekiela in https://github.com/huggingface/evaluate/pull/237
Add support for two input columns for TextClassificationEvaluator by @fxmarty in https://github.com/huggingface/evaluate/pull/205
fix bug in TEMPLATE_REQUIRE: add comma by @BramVanroy in https://github.com/huggingface/evaluate/pull/248
Minor quicktour doc suggestions by @stevhliu in https://github.com/huggingface/evaluate/pull/236
Clarify error message for ChrF no. references by @BramVanroy in https://github.com/huggingface/evaluate/pull/247
only track unique missing dependencies by @BramVanroy in https://github.com/huggingface/evaluate/pull/246
Update evaluate in spaces by @lvwerra in https://github.com/huggingface/evaluate/pull/228
add commit_hash to args by @lvwerra in https://github.com/huggingface/evaluate/pull/253
Change perplexity to be calculated with base e by @mathemakitten in https://github.com/huggingface/evaluate/pull/242
Rebase for previous PR by @mathemakitten in https://github.com/huggingface/evaluate/pull/254
Fix docstrings with new perplexities with base e by @mathemakitten in https://github.com/huggingface/evaluate/pull/255
add a tokenizer option to rouge by @lvwerra in https://github.com/huggingface/evaluate/pull/258
Adding list_duplicates=True to example. by @meg-huggingface in https://github.com/huggingface/evaluate/pull/263
Minor change in describing what this does. by @meg-huggingface in https://github.com/huggingface/evaluate/pull/267
Mapping example output to returned output. by @meg-huggingface in https://github.com/huggingface/evaluate/pull/268
Changes "duplicates_list" to "duplicates_dict" (since it's dict) by @meg-huggingface in https://github.com/huggingface/evaluate/pull/265
Changes "duplicates_list" to "duplicates_dict" in the example. by @meg-huggingface in https://github.com/huggingface/evaluate/pull/264
Add slow flag to two column parity test by @lvwerra in https://github.com/huggingface/evaluate/pull/273
Remove handle_impossible_answer from the default PIPELINE_KWARGS in the question answering evaluator by @fxmarty in https://github.com/huggingface/evaluate/pull/272
Toxicity Measurement by @sashavor in https://github.com/huggingface/evaluate/pull/262
Automatically choose dataset split if none provided by @mathemakitten in https://github.com/huggingface/evaluate/pull/232
Fix YAML in Toxicity by @lvwerra in https://github.com/huggingface/evaluate/pull/278
Added metric Brier Score by @kadirnar in https://github.com/huggingface/evaluate/pull/275
Check for mismatch in device setup in evaluator by @mathemakitten in https://github.com/huggingface/evaluate/pull/287
Fix transfomers import in the evaluator by @mathemakitten in https://github.com/huggingface/evaluate/pull/291
Add support for name field when loading data by @mathemakitten in https://github.com/huggingface/evaluate/pull/283
Adding regard measurement by @sashavor in https://github.com/huggingface/evaluate/pull/271
Raise exception instead of assert in BertScore by @BramVanroy in https://github.com/huggingface/evaluate/pull/292
fix regard yaml by @lvwerra in https://github.com/huggingface/evaluate/pull/295
Add CONTRIBUTING.md by @mathemakitten in https://github.com/huggingface/evaluate/pull/293
Refactor kwargs and configs by @lvwerra in https://github.com/huggingface/evaluate/pull/188
Revert "Refactor kwargs and configs" by @lvwerra in https://github.com/huggingface/evaluate/pull/299
Add missing split and subset kwarg into other evaluators by @mathemakitten in https://github.com/huggingface/evaluate/pull/301
Adding HONEST score by @sashavor in https://github.com/huggingface/evaluate/pull/279
fix wrong sorting in check by @sanderland in https://github.com/huggingface/evaluate/pull/305
Fix HONEST yaml by @lvwerra in https://github.com/huggingface/evaluate/pull/303
Refactor current_features to selected_feature_format by @mathemakitten in https://github.com/huggingface/evaluate/pull/306
replace datasets list with local list of tasks by @lvwerra in https://github.com/huggingface/evaluate/pull/309
Adding torch to the requirements by @sashavor in https://github.com/huggingface/evaluate/pull/311
Honest space fix by @sashavor in https://github.com/huggingface/evaluate/pull/312
Use HTML relative paths for tiles by @lewtun in https://github.com/huggingface/evaluate/pull/318
Test for valid YAML files by @mathemakitten in https://github.com/huggingface/evaluate/pull/308
add versioning the HubEvaluationModuleFactory by @lvwerra in https://github.com/huggingface/evaluate/pull/314
Add text2text evaluator by @lvwerra in https://github.com/huggingface/evaluate/pull/261
try main if tag does not work by @lvwerra in https://github.com/huggingface/evaluate/pull/322

New Contributors

@fcakyon made their first contribution in https://github.com/huggingface/evaluate/pull/221
@meg-huggingface made their first contribution in https://github.com/huggingface/evaluate/pull/231
@stevhliu made their first contribution in https://github.com/huggingface/evaluate/pull/236
@kadirnar made their first contribution in https://github.com/huggingface/evaluate/pull/275
@sanderland made their first contribution in https://github.com/huggingface/evaluate/pull/305

Full Changelog: https://github.com/huggingface/evaluate/compare/v0.2.2...v0.3.0

evaluate - v0.2.2

Published by lvwerra about 2 years ago

What's Changed

Update CLI docs by @lvwerra in https://github.com/huggingface/evaluate/pull/218
Add a fingerprint for each EvaluationModule by @mathemakitten in https://github.com/huggingface/evaluate/pull/206
Fix loading error by @lvwerra in https://github.com/huggingface/evaluate/pull/222

Full Changelog: https://github.com/huggingface/evaluate/compare/v0.2.1...v0.2.2

evaluate - v0.2.1

Published by lvwerra about 2 years ago

What's Changed

Add measurements to quality and style checks by @lvwerra in https://github.com/huggingface/evaluate/pull/203
Add comparisons and measurements to code quality tests by @lvwerra in https://github.com/huggingface/evaluate/pull/204
Remove mention to datasets from docs by @albertvillanova in https://github.com/huggingface/evaluate/pull/207
Adding label distribution measurement by @sashavor in https://github.com/huggingface/evaluate/pull/202
Fix spaces tagging by @lvwerra in https://github.com/huggingface/evaluate/pull/217
set datasets to >=2.0.0 by @lvwerra in https://github.com/huggingface/evaluate/pull/216

Full Changelog: https://github.com/huggingface/evaluate/compare/v0.2.0...v0.2.1

evaluate - v0.2.0

Published by lvwerra about 2 years ago

What's New

`evaluator`

The evaluator has been extended to three new tasks:

"image-classification"
"token-classification"
"question-answering"

`combine`

With combine one can bundle several metrics into a single object that can be evaluated in one call and also used in combination with the evalutor.

What's Changed

Fix typo in WER docs by @pn11 in https://github.com/huggingface/evaluate/pull/147
Fix rouge outputs by @lvwerra in https://github.com/huggingface/evaluate/pull/158
add tutorial for custom pipeline by @lvwerra in https://github.com/huggingface/evaluate/pull/154
refactor evaluator tests by @lvwerra in https://github.com/huggingface/evaluate/pull/155
rename input_texts to predictions in perplexity by @lvwerra in https://github.com/huggingface/evaluate/pull/157
Add link to GitHub author by @lewtun in https://github.com/huggingface/evaluate/pull/166
Add combine to compose multiple evaluations by @lvwerra in https://github.com/huggingface/evaluate/pull/150
test string casting only on first element by @lvwerra in https://github.com/huggingface/evaluate/pull/159
remove unused fixtures from unittests by @lvwerra in https://github.com/huggingface/evaluate/pull/170
Add a test to check that Evaluator evaluations match transformers examples by @fxmarty in https://github.com/huggingface/evaluate/pull/163
Add smaller model for TextClassificationEvaluator test by @fxmarty in https://github.com/huggingface/evaluate/pull/172
Add tags to spaces by @lvwerra in https://github.com/huggingface/evaluate/pull/162
Rename evaluation modules by @lvwerra in https://github.com/huggingface/evaluate/pull/160
Update push_evaluations_to_hub.py by @lvwerra in https://github.com/huggingface/evaluate/pull/174
update evaluate dependency for spaces by @lvwerra in https://github.com/huggingface/evaluate/pull/175
Add ImageClassificationEvaluator by @fxmarty in https://github.com/huggingface/evaluate/pull/173
attempting to let meteor handle multiple references per prediction by @sashavor in https://github.com/huggingface/evaluate/pull/164
fixed duplicate calculation of spearmanr function in metrics wrapper. by @benlipkin in https://github.com/huggingface/evaluate/pull/176
forbid hyphens in template for module names by @lvwerra in https://github.com/huggingface/evaluate/pull/177
switch from Github to Hub module factory for canonical modules by @lvwerra in https://github.com/huggingface/evaluate/pull/180
Fix bertscore idf by @lvwerra in https://github.com/huggingface/evaluate/pull/183
refactor evaluator base and task classes by @lvwerra in https://github.com/huggingface/evaluate/pull/185
Avoid importing tensorflow when importing evaluate by @NouamaneTazi in https://github.com/huggingface/evaluate/pull/135
Add QuestionAnsweringEvaluator by @fxmarty in https://github.com/huggingface/evaluate/pull/179
Evaluator perf by @ola13 in https://github.com/huggingface/evaluate/pull/178
Fix QuestionAnsweringEvaluator for squad v2, fix examples by @fxmarty in https://github.com/huggingface/evaluate/pull/190
Rename perf metric evaluator by @lvwerra in https://github.com/huggingface/evaluate/pull/191
Fix typos in QA Evaluator by @lewtun in https://github.com/huggingface/evaluate/pull/192
Evaluator device placement by @lvwerra in https://github.com/huggingface/evaluate/pull/193
Change test command in installation.mdx to use exact_match by @mathemakitten in https://github.com/huggingface/evaluate/pull/194
Add TokenClassificationEvaluator by @fxmarty in https://github.com/huggingface/evaluate/pull/167
Pin rouge_score by @albertvillanova in https://github.com/huggingface/evaluate/pull/197
add poseval by @lvwerra in https://github.com/huggingface/evaluate/pull/195
Combine docs by @lvwerra in https://github.com/huggingface/evaluate/pull/201
Evaluator column loading by @lvwerra in https://github.com/huggingface/evaluate/pull/200
Evaluator documentation by @lvwerra in https://github.com/huggingface/evaluate/pull/199

New Contributors

@pn11 made their first contribution in https://github.com/huggingface/evaluate/pull/147
@fxmarty made their first contribution in https://github.com/huggingface/evaluate/pull/163
@benlipkin made their first contribution in https://github.com/huggingface/evaluate/pull/176
@NouamaneTazi made their first contribution in https://github.com/huggingface/evaluate/pull/135
@mathemakitten made their first contribution in https://github.com/huggingface/evaluate/pull/194

Full Changelog: https://github.com/huggingface/evaluate/compare/v0.1.2...v0.2.0

evaluate - v0.1.2

Published by lvwerra over 2 years ago

What's Changed

Fix trec sacrebleu by @lvwerra in https://github.com/huggingface/evaluate/pull/130
Add distilled version Cometihno by @BramVanroy in https://github.com/huggingface/evaluate/pull/131
fix: add yaml extension to github action for release by @lvwerra in https://github.com/huggingface/evaluate/pull/133
fix docs badge by @lvwerra in https://github.com/huggingface/evaluate/pull/134
fix cookiecutter path to repository by @lvwerra in https://github.com/huggingface/evaluate/pull/139
docs: make metric cards more prominent by @lvwerra in https://github.com/huggingface/evaluate/pull/132
Update README.md by @sashavor in https://github.com/huggingface/evaluate/pull/145
Fix datasets download imports by @albertvillanova in https://github.com/huggingface/evaluate/pull/143

New Contributors

@BramVanroy made their first contribution in https://github.com/huggingface/evaluate/pull/131
@albertvillanova made their first contribution in https://github.com/huggingface/evaluate/pull/143

Full Changelog: https://github.com/huggingface/evaluate/compare/v0.1.1...v0.1.2

evaluate - v0.1.1

Published by lvwerra over 2 years ago

What's Changed

Fix broken links by @mishig25 in https://github.com/huggingface/evaluate/pull/92
Fix readme by @lvwerra in https://github.com/huggingface/evaluate/pull/98
Fixing broken evaluate-measurement hub link by @panwarnaveen9 in https://github.com/huggingface/evaluate/pull/102
fix typo in autodoc by @manueldeprada in https://github.com/huggingface/evaluate/pull/101
fix typo by @manueldeprada in https://github.com/huggingface/evaluate/pull/100
FIX pip install evaluate[evaluator] by @philschmid in https://github.com/huggingface/evaluate/pull/103
fix description field in metric template readme by @lvwerra in https://github.com/huggingface/evaluate/pull/122
Add automatic pypi release for evaluate by @osanseviero in https://github.com/huggingface/evaluate/pull/121
Fix typos in Evaluator docstrings by @lewtun in https://github.com/huggingface/evaluate/pull/124
Fix spaces description in metadata by @lvwerra in https://github.com/huggingface/evaluate/pull/123
fix revision string if it is a python version by @lvwerra in https://github.com/huggingface/evaluate/pull/129
Use accuracy as default metric for text classification Evaluator by @lewtun in https://github.com/huggingface/evaluate/pull/128
bump evaluate dependency in spaces by @lvwerra in https://github.com/huggingface/evaluate/pull/88

New Contributors

@panwarnaveen9 made their first contribution in https://github.com/huggingface/evaluate/pull/102
@manueldeprada made their first contribution in https://github.com/huggingface/evaluate/pull/101
@philschmid made their first contribution in https://github.com/huggingface/evaluate/pull/103
@osanseviero made their first contribution in https://github.com/huggingface/evaluate/pull/121
@lewtun made their first contribution in https://github.com/huggingface/evaluate/pull/124

Full Changelog: https://github.com/huggingface/evaluate/compare/v0.1.0...v0.1.1

evaluate - Initial relase of `evaluate`

Published by lvwerra over 2 years ago

Release notes

These are the release notes of the initial release of the Evaluate library.

Goals

Goals of the Evaluate library:

reproducibility: reporting and reproducing results is easy
ease-of-use: access to a wide range of evaluation tools with a unified interface
diversity: provide wide range of evaluation tools with metrics, comparisons, and measurements
multimodal: models and datasets of many modalities can be evaluated
community-driven: anybody can add custom evaluations by hosting them on the Hugging Face Hub

Release overview:

evaluate.load(): The load() function is the main entry point into evaluate and allows to load evaluation modules from a local folder, the evaluate repository, or the Hugging Face Hub. It downloads, caches, and loads the evaluation modules and returns an evaluate.EvaluationModule.
evaluate.save(): With save() a user can save evaluation results in a JSON file. In addition to the results from evaluate.EvaluationModule it can save additional parameters and automatically saves the timestamp, git commit hash, library version as well as Python path. One can either provide a directory for the results, in which case file names are automatically created, or an explicit file name for the result.
evaluate.push_to_hub(): The push_to_hub function allows to push the results of a model evaluation to the model card on the Hugging Face Hub. The model, dataset, and metric are specified such that they can be linked on the hub.
evaluate.EvaluationModule: The EvaluationModule class is the baseclass for all evaluation modules. There are three module types: metrics (to evaluate models), comparisons (to compare models), and measurements (to analyze datasets). The inputs can be either added with add (single input) and add_batch (batch of inputs) followed by a final compute call to compute the scores or all inputs can be passed to compute directly. Under the hood, Apache Arrow stores and loads the input data to compute the scores.
evaluate.EvaluationModuleInfo: The EvaluationModule class is used to store attributes:
- description: A short description of the evaluation module.
- citation: A BibTex string for citation when available.
- features: A Features object defining the input format. The inputs provided to add, add_batch, and compute are tested against these types and an error is thrown in case of a mismatch.
- inputs_description: This is equivalent to the modules docstring.
- homepage: The homepage of the module.
- license: The license of the module.
- codebase_urls: Link to the code behind the module.
- reference_urls: Additional reference URLs.
evaluate.evaluator: The evaluator provides automated evaluation and only requires a model, dataset, metric, in contrast to the metrics in the EvaluationModule which require model predictions. It has three main components: a model wrapped in a pipeline, a dataset, and a metric, and it returns the computed evaluation scores. Besides the three main components, it may also require two mappings to align the columns in the dataset and the pipeline labels with the datasets labels. This is an experimental feature -- currently, only text classification is supported.
evaluate-cli: The community can add custom metrics by adding the necessary module script to a Space on the Hugging Face Hub. The evaluate-cli is a tool that simplifies this process by creating the Space, populating a template, and pushing it to the Hub. It also provides instructions to customize the template and integrate custom logic.