Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
APACHE-2.0 License
Bot releases are visible (Hide)
Published by frascuchon over 2 years ago
Published by frascuchon over 2 years ago
Published by frascuchon over 2 years ago
You can now build multilabel text classification datasets using query-based rules
If you want to get started, check out this tutorial.
https://user-images.githubusercontent.com/1107111/160930404-7b909f1e-b871-4e4c-b1c8-ea9eabfcad21.mp4
You can now read ANY text classification, NER, or text2text dataset directly from the Hub and load it into Rubrix.
To understand how Rubrix datasets work check out this guide.
Organizing teams and datasets is a key Rubrix feature. After several rounds of feedback with early users, we've completely redesigned the user experience. Let us know what you think.
You can get started and configure users and workspaces following this guide
We have included a new in-depth guide about the Lucene-based query language and data model used for search, weak labeling, loading subsets of data, and metrics.
Published by frascuchon over 2 years ago
Published by frascuchon over 2 years ago
Now you can use filters in the Define Rules mode (weak labeling). These filters are useful for seeing the impact of rules on specific dataset subpopulations/subsets (e.g., with certain metadata fields, annotated records, etc.):
None
for event_timestamp
(#1105) (21e78e4)Published by frascuchon over 2 years ago
rb.Dataset*
and π€ Hub integrationThe Dataset classes are lightweight containers for Rubrix records. These classes facilitate importing from and exporting to different formats (e.g., pandas.DataFrame
, datasets.Dataset
) as well as sharing and versioning Rubrix datasets using the Hugging Face Hub.
With this release, Rubrix users and teams can use the Hugging Face Hub to share and read both public and private Rubrix datasets for TextClassification, TokenClassification, and Text2Text datasets. This opens up a whole new world of possibilities for data reproducibility and sharing. Let's see an example:
import rubrix as rb
from datasets import load_datasets
# π§π» π·οΈ Leire has labeled a text classification dataset using a local Rubrix instance
dataset_rb = rb.load("text_classification_ds", as_pandas=False)
# π§π» exports a Rubrix Dataset to a hf Dataset
dataset_ds = dataset_rb.to_datasets()
# π§π» π Leire shares the labelled dataset with the world
dataset_ds.push_to_hub("text_classification_ds")
# π¨ John downloads the dataset from the Hugging Face Hub
dataset_ds = load_dataset("leire/text_classification_ds", split="train")
# π¨ reads in dataset
dataset_rb = rb.read_datasets(dataset_ds, task="TextClassification")
# π¨ π·οΈ logs the dataset and continues labeling with his own Rubrix instance
rb.log(dataset_rb, "john_text_classification_ds")
You can read more at https://rubrix.readthedocs.io/en/stable/guides/datasets.html
For each record type, thereβs a corresponding Dataset class called DatasetFor<RecordType>
. You can look up their API in the reference section.
The UI for Token Classification has been completely redesigned to provide a better user experience for exploration and annotation. This is the first of a set of changes focusing on annotation productivity for token classification.
Published by frascuchon over 2 years ago
python -m rubrix
)LabelModel.score
method (#979) (2887907), closes #953
rules
in WeakLabels (#976) (34389d3), closes #955 #1011
Published by frascuchon over 2 years ago
Published by frascuchon almost 3 years ago
Published by frascuchon almost 3 years ago
Published by frascuchon almost 3 years ago
Published by frascuchon almost 3 years ago
Published by frascuchon almost 3 years ago
We are glad to introduce the most important feature to date: now it's possible to iterate on labeling queries directly in the UI with initial support for multi-class text classification. Multilabel and token classification support is coming soon.
See the video for the recommended workflow:
https://user-images.githubusercontent.com/1107111/149346471-93cbd7ee-96a2-451a-8f5e-f9e26b246407.mp4
Check the updated tutorial: https://rubrix.readthedocs.io/en/master/tutorials/weak-supervision-with-rubrix.html
conda
install instruction (#788)Published by frascuchon almost 3 years ago
conda
install instruction (#788)Full Changelog: https://github.com/recognai/rubrix/compare/v0.7.0...v0.8.0-alpha.0
Published by frascuchon almost 3 years ago
Rubrix Workspaces enable you to organize your data collection and monitoring workflows much more flexibly than before. Workspaces can be project-based (for separating the work across different projects), team-based (for organizing the work across teams), model-based (for organizing data collection and monitoring on a per-model or model group basis), or anything you can think about. A workspace is a Rubrix βspaceβ where users can collaborate, both using the Webapp and the Python client. There are two types of workspace:
Team workspace
: Where one or several users have read/write access.
User workspace
: Every user gets its own user workspace. This workspace is the default workspace when users log in and log and load data with the Python client. The name of this workspace corresponds to the username.
Additionally, you can still use tags
and metadata
to structure datasets inside a workspace.
The setup should be pretty straight forward, you can find all details here: https://rubrix.readthedocs.io/en/stable/getting_started/user-management.html.
From the Python library side, to know how to log and load data from different workspaces, check the Python client API docs: https://rubrix.readthedocs.io/en/stable/reference/python/python_client.html
The API docs for the weak supervision model can be found here: https://rubrix.readthedocs.io/en/stable/reference/python/python_labeling.html#python-labeling
Refined the annotation module for text classification, especially for text classification with a high number of labels
Increased the support for Rubrix Metrics, check this guide for more information: https://rubrix.readthedocs.io/en/stable/guides/metrics.html
To use this new release, do not forget to run:
Update the client library:
pip install -U rubrix
If you are using Docker:
docker-compose pull
docker-compose up
If you are using the python server:
pip install -U rubrix[server]
RubrixClient
out of init (#563) by David Fidalgosetup.cfg
(#562) by David Fidalgorb.load
for ids with mixed types (#577) by David Fidalgorb.log
(#609) by David FidalgoFull Changelog: https://github.com/recognai/rubrix/compare/v0.6.2...v0.7.0
Published by frascuchon almost 3 years ago
RubrixClient
out of init (#563) by David Fidalgosetup.cfg
(#562) by David Fidalgorb.load
for ids with mixed types (#577) by David Fidalgorb.log
(#609) by David FidalgoFull Changelog: https://github.com/recognai/rubrix/compare/v0.6.2...v0.7.0-alpha.1
Published by frascuchon almost 3 years ago
RubrixClient
out of init (#563) by David Fidalgosetup.cfg
(#562) by David Fidalgorb.load
for ids with mixed types (#577) by David Fidalgorb.log
(#609) by David FidalgoFull Changelog: https://github.com/recognai/rubrix/compare/v0.6.2...v0.7.0-alpha.0