Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
APACHE-2.0 License
Bot releases are hidden (Show)
Published by frascuchon about 1 year ago
ArgillaTrainer
integration with sentence-transformers, allowing fine tuning for sentence similarity (#3739)ArgillaTrainer
integration with TrainingTask.for_question_answering
(#3740)Auto save record
to save automatically the current record that you are working on (#3541)ArgillaTrainer
integration with OpenAI, allowing fine tuning for chat completion (#3615)workspaces list
command to list Argilla workspaces (#3594).datasets list
command to list Argilla datasets (#3658).users create
command to create users (#3667).whoami
command to get current user (#3673).users delete
command to delete users (#3671).users list
command to list users (#3688).workspaces delete-user
command to remove a user from a workspace (#3699).datasets list
command to list Argilla datasets (#3658).users create
command to create users (#3667).users delete
command to delete users (#3671).workspaces create
command to create an Argilla workspace (#3676).datasets push-to-hub
command to push a FeedbackDataset
from Argilla into the HuggingFace Hub (#3685).info
command to get info about the used Argilla client and server (#3707).datasets delete
command to delete a FeedbackDataset
from Argilla (#3703).created_at
and updated_at
properties to RemoteFeedbackDataset
and FilteredRemoteFeedbackDataset
(#3709).PermissionError
when executing a command with a logged in user with not enough permissions (#3717).workspaces add-user
command to add a user to workspace (#3712).workspace_id
param to GET /api/v1/me/datasets
endpoint (#3727).workspace_id
arg to list_datasets
in the Python SDK (#3727).argilla
script that allows to execute Argilla CLI using the argilla
command (#3730).server_info
function to check the Argilla server information (also accessible via rg.server_info
) (#3772).database
commands under server
group of commands (#3710)server
commands only included in the CLI app when server
extra requirements are installed (#3710).PUT /api/v1/responses/{response_id}
to replace values
stored with received values
in request (#3711).UserWarning
when the user_id
in Workspace.add_user
and Workspace.delete_user
is the ID of an user with the owner role as they don't require explicit permissions (#3716).tasks
sub-package to cli
(#3723).argilla database
command in the CLI to now be accessed via argilla server database
, to be deprecated in the upcoming release (#3754).visible_options
(of label and multi label selection questions) validation in the backend to check that the provided value is greater or equal than/to 3 and less or equal than/to the number of provided options (#3773).remove user modification in text component on clear answers
(#3775)Highlight raw text field in dataset feedback task
(#3731)Field title too long
(#3734)DatasetForTextClassification
(#3652)Pending queue
pagination problems when during data annotation (#3677)visible_labels
default value to be 20 just when visible_labels
not provided and len(labels) > 20
, otherwise it will either be the provided visible_labels
value or None
, for LabelQuestion
and MultiLabelQuestion
(#3702).DatasetCard
generation when RemoteFeedbackDataset
contains suggestions (#3718).draft
status in ResponseSchema
as now there can be responses with draft
status when annotating via the UI (#3749)./api/datasets
endpoints due to the TaskType
enum replacement in the endpoint URL (#3769).Full Changelog: https://github.com/argilla-io/argilla/compare/v1.15.1...v1.16.0
Published by damianpumar about 1 year ago
Text component
text content sanitization behavior just for markdown to prevent disappear the text (#3738)Text component
now you need to press Escape to exit the text area (#3733)SearchEngine
was creating the same number of primary shards and replica shards for each FeedbackDataset
(#3736).Published by damianpumar about 1 year ago
Argilla 1.15.0 comes with an enhanced FeedbackDataset
settings page enabling the update of the dataset settings, an integration of the TRL package with the ArgillaTrainer
, and continues adding improvements to the Python client for managing FeedbackDataset
s.
FeedbackDataset
settings from the UIFeedbackDataset
settings page has been updated and now it allows to update the guidelines
and some attributes of the fields
and questions
of the dataset. Did you misspell the title or description of a field or question? Well, you don't have to remove your dataset and create it again anymore! Just go to the settings page and fix it.
ArgillaTrainer
The famous TRL package for training Transformers with Reinforcement Learning techniques has been integrated with the ArgillaTrainer, that comes with four new TrainingTask
: SFT, Reward Modeling, PPO and DPO. Each training task expects a formatting function that will return the data in the expected format for training the model.
Check this π tutorial for training a Reward Model using the Argilla Trainer.
FeedbackDataset
and remove suggestionsIn the 1.14.0 release we added many improvements for working with remote FeedbackDataset
s. In this release, a new filter_by
method has been added that allows to filter the records of a dataset from the Python client. For now, the records can be only filtered using the response_status
, but we're planning adding more complex filters for the upcoming releases. In addition, new methods have been added allowing to remove the suggestions created for a record.
Enable to update guidelines and dataset settings for Feedback Datasets directly in the UI
(#3489)ArgillaTrainer
integration with TRL, allowing for easy supervised finetuning, reward modeling, direct preference optimization and proximal policy optimization (#3467)formatting_func
to ArgillaTrainer
for FeedbackDataset
datasets add a custom formatting for the data (#3599).login
function in argilla.client.login
to login into an Argilla server and store the credentials locally (#3582).login
command to login into an Argilla server (#3600).logout
command to logout from an Argilla server (#3605).DELETE /api/v1/suggestions/{suggestion_id}
endpoint to delete a suggestion given its ID (#3617).DELETE /api/v1/records/{record_id}/suggestions
endpoint to delete several suggestions linked to the same record given their IDs (#3617).response_status
param to GET /api/v1/datasets/{dataset_id}/records
to be able to filter by response_status
as previously included for GET /api/v1/me/datasets/{dataset_id}/records
(#3613).list
classmethod to ArgillaMixin
to be used as FeedbackDataset.list()
, also including the workspace
to list from as arg (#3619).filter_by
method in RemoteFeedbackDataset
to filter based on response_status
(#3610).list_workspaces
function (to be used as rg.list_workspaces
, but Workspace.list
is preferred) to list all the workspaces from an user in Argilla (#3641).list_datasets
function (to be used as rg.list_datasets
) to list the TextClassification
, TokenClassification
, and Text2Text
datasets in Argilla (#3638).RemoteSuggestionSchema
to manage suggestions in Argilla, including the delete
method to delete suggestios from Argilla via DELETE /api/v1/suggestions/{suggestion_id}
(#3651).delete_suggestions
to RemoteFeedbackRecord
to remove suggestions from Argilla via DELETE /api/v1/records/{record_id}/suggestions
(#3651).Optional label for * mark for required question
(#3608)RemoteFeedbackDataset.delete_records
to use batch delete records endpoint (#3580).allowed_for_roles
for some RemoteFeedbackDataset
, RemoteFeedbackRecords
, and RemoteFeedbackRecord
methods that are only allowed for users with roles owner
and admin
(#3601).ArgillaToFromMixin
to ArgillaMixin
(#3619).users
CLI app under database
CLI app (#3593).Enum
classes to argilla.server.enums
module (#3620).Filter by workspace in breadcrumbs
(#3577)Filter by workspace in datasets table
(#3604)Query search highlight
for Text2Text and TextClassification (#3621)RatingQuestion.values
validation to raise a ValidationError
when values are out of range i.e. [1, 10] (#3626).multi_task_text_token_classification
from TaskType
as not used (#3640).argilla_id
in favor of id
from RemoteFeedbackDataset
(#3663).fetch_records
from RemoteFeedbackDataset
as now the records are lazily fetched from Argilla (#3663).push_to_argilla
from RemoteFeedbackDataset
, as it just works when calling it through a FeedbackDataset
locally, as now the updates of the remote datasets are automatically pushed to Argilla (#3663).set_suggestions
in favor of update(suggestions=...)
for both FeedbackRecord
and RemoteFeedbackRecord
, as all the updates of any "updateable" attribute of a record will go through update
instead (#3663).owner
attribute for client Dataset data model (#3665)Full Changelog: https://github.com/argilla-io/argilla/compare/v1.14.1...v1.15.0
Published by gabrielmbmb about 1 year ago
begin_nested
because of missing commit
(#3567).Full Changelog: https://github.com/argilla-io/argilla/compare/v1.14.0...v1.14.1
Published by gabrielmbmb about 1 year ago
Argilla 1.14.0 comes packed with improvements to manage Feedback Datasets from the Python client. Here are the most important changes in this version:
Pushing a dataset to Argilla will now create a RemoteFeedbackDataset
in Argilla. To make changes to your dataset in Argilla you will need to make those updates to the remote dataset. You can do so by either using the dataset returned when using the push_to_argilla()
method (as shown in the image above) or by loading the dataset like so:
import argilla as rg
# connect to Argilla
rg.init(api_url="...", api_key="...")
# get the existing dataset in Argilla
remote_dataset = rg.FeedbackDataset.from_argilla(name="my-dataset", workspace="my-workspace")
# add a list of FeedbackRecords to the dataset in Argilla
remote_dataset.add_records(...)
Alternatively, you can make a local copy of the dataset using the pull()
method.
local_dataset = remote_dataset.pull()
Note that any changes that you make to this local dataset will not affect the remote dataset in Argilla.
How to add records to an existing dataset in Argilla was demonstrated in the first code snippet in the "Pushing and pulling a dataset" section. This is how you can delete a list of records using that same dataset:
records_to_delete = remote_dataset.records[0:5]
remote_dataset.delete_records(records_to_delete)
Or delete a single record:
record = remote_dataset.records[-1]
record.delete()
To add and update suggestions in existing records, you can simply use the update()
method. For example:
for record in remote_dataset.records:
record.update(suggestions=...)
Note that adding a suggestion to a question that already has one will overwrite the previous suggestion. To learn more about the format that the suggestions must follow, check our docs.
You can now easily delete datasets from the Python client. To do that, get the existing dataset like demonstrated in the first section and just use:
remote_dataset.delete()
Now you can create a user and directly assign existing workspaces to grant them access.
user = rg.User.create(username="...", first_name="...", password="...", workspaces=["ws1", "ws2"])
PATCH /api/v1/fields/{field_id}
endpoint to update the field title and markdown settings (#3421).PATCH /api/v1/datasets/{dataset_id}
endpoint to update dataset name and guidelines (#3402).PATCH /api/v1/questions/{question_id}
endpoint to update question title, description and some settings (depending on the type of question) (#3477).DELETE /api/v1/records/{record_id}
endpoint to remove a record given its ID (#3337).pull
method in RemoteFeedbackDataset
(a FeedbackDataset
pushed to Argilla) to pull all the records from it and return it as a local copy as a FeedbackDataset
(#3465).delete
method in RemoteFeedbackDataset
(a FeedbackDataset
pushed to Argilla) (#3512).delete_records
method in RemoteFeedbackDataset
, and delete
method in RemoteFeedbackRecord
to delete records from Argilla (#3526).ArgillaDatasetMixin
to detach the Argilla-related functionality from the FeedbackDataset
(#3427)FeedbackDataset
-related pydantic.BaseModel
schemas to argilla.client.feedback.schemas
instead, to be better structured and more scalable and maintainable (#3427)POST /api/users
endpoint to be able to provide a list of workspace names to which the user should be linked to (#3462).User.create
method to be able to provide a list of workspace names to which the user should be linked to (#3462).GET /api/v1/me/datasets/{dataset_id}/records
endpoint to allow getting records matching one of the response statuses provided via query param (#3359).POST /api/v1/me/datasets/{dataset_id}/records
endpoint to allow searching records matching one of the response statuses provided via query param (#3359).SearchEngine.search
method to allow searching records matching one of the response statuses provided (#3359).FeedbackDataset.push_to_argilla
, the methods FeedbackDataset.add_records
and FeedbackRecord.set_suggestions
will automatically call Argilla with no need of calling push_to_argilla
explicitly (#3465).FeedbackDataset.push_to_huggingface
dumps the responses
as a List[Dict[str, Any]]
instead of Sequence
to make it more readable via π€datasets
(#3539).bool
values and default
from Jinja2 while generating the HuggingFace DatasetCard
from argilla_template.md
(#3499).DatasetConfig.from_yaml
which was failing when calling FeedbackDataset.from_huggingface
as the UUIDs cannot be deserialized automatically by PyYAML
, so UUIDs are neither dumped nor loaded anymore (#3502).TextClassificationSettings
and TokenClassificationSettings
labels are properly parsed to strings both in the Python client and in the backend endpoint (#3495).PUT /api/v1/datasets/{dataset_id}/publish
to check whether at least one field and question has required=True
(#3511).FeedbackDataset.from_huggingface
as suggestions
were being lost when there were no responses
(#3539).QuestionSchema
and FieldSchema
not validating name
attribute (#3550).FeedbackDataset.push_to_argilla
, calling push_to_argilla
again won't do anything since the dataset is already pushed to Argilla (#3465).FeedbackDataset.push_to_argilla
, calling fetch_records
won't do anything since the records are lazily fetched from Argilla (#3465).FeedbackDataset.push_to_argilla
, the Argilla ID is no longer stored in the attribute/property argilla_id
but in id
instead (#3465).Full Changelog: https://github.com/argilla-io/argilla/compare/v1.13.3...v1.14.0
Published by frascuchon about 1 year ago
ModuleNotFoundError
caused because the argilla.utils.telemetry
module used in the ArgillaTrainer
was importing an optional dependency not installed by default (#3471).ImportError
caused because the argilla.client.feedback.config
module was importing pyyaml
optional dependency not installed by default (#3471).Full Changelog: https://github.com/argilla-io/argilla/compare/v1.13.2...v1.13.3
Published by frascuchon over 1 year ago
Full Changelog: https://github.com/argilla-io/argilla/compare/v1.13.0...v1.13.1
Published by frascuchon over 1 year ago
All question types in the Feedback task support suggestions, but you can only add one suggestion per question.
Learn more about this feature in our docs.
We've added functionalities to list all the workspaces that a user has access to. From the Python client you will be able to list all workspaces of the current user using rg.Workspace.list()
and in the UI you will be able to see the list of workspaces in the user settings page.
Read more in the docs.
We are extending the support we give to help preparing data from Feedback datasets to use during training. As part of this release we include strategies to unify responses to RankingQuestion
s and also provide a task mapping for text classification TrainingTaskMapping.for_text_classification
.
Read more about how to use these methods to train models with Feedback collected in Argilla here.
GET /api/v1/users/{user_id}/workspaces
endpoint to list the workspaces to which a user belongs (#3308 and #3343).HuggingFaceDatasetMixin
for internal usage, to detach the FeedbackDataset
integrations from the class itself, and use Mixins instead (#3326).GET /api/v1/records/{record_id}/suggestions
API endpoint to get the list of suggestions for the responses associated to a record (#3304).POST /api/v1/records/{record_id}/suggestions
API endpoint to create a suggestion for a response associated to a record (#3304).RankingQuestionStrategy
, RankingQuestionUnification
and the .for_text_classification
method for the TrainingTaskMapping
(#3364)PUT /api/v1/records/{record_id}/suggestions
API endpoint to create or update a suggestion for a response associated to a record (#3304 & 3391).suggestions
attribute to FeedbackRecord
, and allow adding and retrieving suggestions from the Python client (#3370)allowed_for_roles
Python decorator to check whether the current user has the required role to access the decorated function/method for User
and Workspace
(#3383)GET /api/v1/me/workspaces
endpoint to list the workspaces of the current active user (#3390)GET /api/v1/datasets/{dataset_id}/records
, GET /api/v1/me/datasets/{dataset_id}/records
, POST /api/v1/me/datasets/{dataset_id}/records/search
endpoints to include the suggestions of the records based on the value of the include
query parameter (#3304).POST /api/v1/datasets/{dataset_id}/records
input payload to add suggestions (#3304).POST /api/datasets/:dataset-id/:task/bulk
endpoints don't create the dataset if does not exists (Closes #3244)ArgillaTrainer
(closes #3325)User.workspaces
is no longer an attribute but a property, and is calling list_user_workspaces
to list all the workspace names for a given user ID (#3334)FeedbackDatasetConfig
to DatasetConfig
and export/import from YAML as default instead of JSON (just used internally on push_to_huggingface
and from_huggingface
methods of FeedbackDataset
) (#3326).Dockerfile
parent image from python:3.9.16-slim
to python:3.10.12-slim
(#3425).quickstart.Dockerfile
parent image from elasticsearch:8.5.3
to argilla/argilla-server:${ARGILLA_VERSION}
(#3425).ARGILLA_
(See #3392).GET /api/v1/me/datasets/{dataset_id}/records
endpoint returning always the responses for the records even if responses
was not provided via the include
query parameter (#3304).ArgillaDatasetCard
to include the values/labels for all the existing questions (#3366)Full Changelog: https://github.com/argilla-io/argilla/compare/v1.12.1...1.13.0
Published by frascuchon over 1 year ago
Published by gabrielmbmb over 1 year ago
RankingQuestion
in Feedback Task datasets
Now you will be able to include RankingQuestion
s in your Feedback datasets. These are specially designed to gather feedback on labeler's preferences, by providing a set of options that labelers can order.
Here's how you can add a RankingQuestion
to a FeedbackDataset
:
dataset = FeedbackDataset(
fields=[
rg.TextField(name="prompt"),
rg.TextField(name="reply-1", title="Reply 1"),
rg.TextField(name="reply-2", title="Reply 2"),
rg.TextField(name="reply-3", title="Reply 3"),
],
questions=[
rg.RankingQuestion(
name="ranking",
title="Order replies based on your preference",
description="1 = best, 3 = worst. Ties are allowed.",
required=True,
values={"reply-1": "Reply 1", "reply-2": "Reply 2", "reply-3": "Reply 3"} # or ["reply-1", "reply-2", "reply-3"]
]
)
More info in our docs.
You can now format responses from RatingQuestion
, LabelQuestion
and MultiLabelQuestion
for your preferred training framework using the prepare_for_training
method.
Also, we've added support for spacy-transformers
in our Argilla Trainer.
Here's an example code snippet:
import argilla.feedback as rg
dataset = rg.FeedbackDataset.from_huggingface(
repo_id="argilla/stackoverflow_feedback_demo"
)
task_mapping = rg.TrainingTaskMapping.for_text_classification(
text=dataset.field_by_name("question"),
label=dataset.question_by_name("tags")
)
trainer = rg.ArgillaTrainer(
dataset=dataset,
task_mapping=task_mapping,
framework="spacy-transformers",
fetch_records=False
)
trainer.update_config(num_train_epochs=2)
trainer.train(output_dir="my_awesone_model")
To learn more about how to use Argilla Trainer check our docs.
RankingQuestionSettings
class allowing to create ranking questions in the API using POST /api/v1/datasets/{dataset_id}/questions
endpoint (#3232)RankingQuestion
in the Python client to create ranking questions (#3275).Ranking
component in feedback task question form (#3177 & #3246).FeedbackDataset.prepare_for_training
method for generaring a framework-specific dataset with the responses provided for RatingQuestion
, LabelQuestion
and MultiLabelQuestion
(#3151).ArgillaSpaCyTransformersTrainer
class for supporting the training with spacy-transformers
(#3256).docker
folder (#3053).release.Dockerfile
have been renamed to Dockerfile
(#3133).rg.load
function to raise a ValueError
with a explanatory message for the cases in which the user tries to use the function to load a FeedbackDataset
(#3289).ArgillaSpaCyTrainer
to allow re-using tok2vec
(#3256).rg.set_workspace
(Closes #3262)Full Changelog: https://github.com/argilla-io/argilla/compare/v1.11.0...v1.12.0
Published by alvarobartt over 1 year ago
owner
role and user update commandWe've added a new user role, owner
, that has permissions over all users, workspaces and datasets in Argilla (like the admin
role in earlier versions). From this version, the admin
role will only have permissions over datasets and users in workspaces assigned to them.
You can change a user from admin to owner using a simple CLI command: python -m argilla users update argilla --role owner
.
You can now get lists of users and workspaces, create new ones and give users access to workspaces directly from the Python SDK. Note that only owners will have permissions for all these actions. Admins will be able to give users access to workspaces where they have access.
You can now add metadata information to your records. This is useful to store information that's not needed for the labeling UI but important for downstream usage (e.g., prompt id, model IDs, etc.)
np.float
alias by float
to avoid AttributeError
when using find_label_errors
function with numpy>=1.24.0
(#3214).format_as("datasets")
when no responses or optional respones in FeedbackRecord
, to set their value to what π€ Datasets expects instead of just None
(#3224).push_to_huggingface()
when generate_card=True
(default behaviour), as we were passing a sample record to the ArgillaDatasetCard
class, and UUID
s introduced in 1.10.0 (#3192), are not JSON-serializable (#3231).from_argilla
and push_to_argilla
to ensure consistency on both field and question re-construction, and to ensure UUID
s are properly serialized as str
, respectively (#3234).metadata
attribute to the Record
of the FeedbackDataset
(#3194)users update
command to update the role for an existing user (#3188)Workspace
class to allow users manage their Argilla workspaces and the users assigned to those workspaces via the Python client (#3180)User
class to let users manage their Argilla users via the Python client (#3169).tqdm
progress bar to FeedbackDataset.push_to_argilla
when looping over the records to upload (#3233).owner
, admin
and annotator
(#3104)admin
role is scoped to workspace-level operations (#3115)owner
user is created among the default pool of users in the quickstart, and the default user in the server has now owner
role (#3248), reverting (#3188).Published by frascuchon over 1 year ago
We've added a search bar in the Feedback Task UI so you can filter records based on specific words or phrases.
Annotation guidelines are now rendered as markdown text to make them easier to read and have a more flexible format.
Admin users have access to a Train </>
button in the Feedback Task UI with quick links to all the information needed to train a model with the feedback gathered in Argilla.
SearchEngine
and POST /api/v1/me/datasets/{dataset_id}/records/search
to return the total
number of records matching the search query (#3166)ArgillaSpanMarkerTrainer
for Named Entity Recognition with span_marker
v1.1.x onwards.ArgillaDatasetCard
import under @requires_version
decorator, so that the ImportError
on huggingface_hub
is handled properly (#3174)FeedbackDataset.from_argilla
-> FeedbackDataset.push_to_argilla
under different dataset names and/or workspaces (#3192)Full Changelog: https://github.com/argilla-io/argilla/compare/v1.9.0...v1.10.0
Published by gabrielmbmb over 1 year ago
We've included two new question types in Feedback Datasets: LabelQuestion
and MultiLabelQuestion
. These are specially useful for applying one or multiple labels to a record, for example, for text classification tasks. In this new view, you can add multiple classification questions and even combine them with the other question types available in Feedback Datasets: RatingQuestion
and TextQuestion
.
You can now add the use_markdown=True
tag to a TextField
or a TextQuestion
to have the UI render the text as markdown. You can use this to read and write code, tables or even add images.
We continue to add improvements to our new Feedback Datasets:
FeedbackDataset.push_to_huggingface(generate_card=True)
now follow the official Hugging Face template.use_markdown
property to TextFieldSettings
model (#3000)use_markdown
property to TextQuestionSettings
model (#3000).draft
for the Response
model (#3033)LabelSelectionQuestionSettings
class allowing to create label selection (single-choice) questions in the API (#3005)MultiLabelSelectionQuestionSettings
class allowing to create multi-label selection (multi-choice) questions in the API (#3010).POST /api/v1/me/datasets/{dataset_id}/records/search
endpoint (#3068).pydantic.BaseModel
s defined at argilla/client/feedback/schemas.py
(#3137)GET /api/v1/me/datasets/:dataset_id/metrics
output payload to include the count of responses with draft
status (#3033)alembic
setup to be able to autogenerate revision/migration scripts using SQLAlchemy metadata from Argilla server models (#3044)DatasetCard
generation on FeedbackDataset.push_to_huggingface
when generate_card=True
, following the official HuggingFace Hub template, but suited to FeedbackDataset
s from Argilla (#3110)fields
and questions
in FeedbackDataset
with the same name (#3126).Published by frascuchon over 1 year ago
In addition, these datasets support multiple annotations: all users with access to the dataset can give their responses.
The FeedbackDataset
has an enhanced integration with the Hugging Face Hub, so that saving a dataset to the Hub or pushing a FeedbackDataset
from the Hub directly to Argilla is seamless.
Check all the things you can do with Feedback Tasks in our docs
We've added a new section in our docs that covers:
We've added new frameworks for the ArgillaTrainer
: ArgillaPeftTrainer
for Text and Token Classification and ArgillaAutoTrainTrainer
for Text Classification.
/api/v1/datasets
new endpoint to list and create datasets ([#2615])./api/v1/datasets/{dataset_id}
new endpoint to get and delete datasets ([#2615])./api/v1/datasets/{dataset_id}/publish
new endpoint to publish a dataset ([#2615])./api/v1/datasets/{dataset_id}/questions
new endpoint to list and create dataset questions ([#2615])/api/v1/datasets/{dataset_id}/fields
new endpoint to list and create dataset fields ([#2615])/api/v1/datasets/{dataset_id}/questions/{question_id}
new endpoint to delete a dataset questions ([#2615])/api/v1/datasets/{dataset_id}/fields/{field_id}
new endpoint to delete a dataset field ([#2615])/api/v1/workspaces/{workspace_id}
new endpoint to get workspaces by id ([#2615])/api/v1/responses/{response_id}
new endpoint to update and delete a response ([#2615])/api/v1/datasets/{dataset_id}/records
new endpoint to create and list dataset records ([#2615])/api/v1/me/datasets
new endpoint to list user visible datasets ([#2615])/api/v1/me/dataset/{dataset_id}/records
new endpoint to list dataset records with user responses ([#2615])/api/v1/me/datasets/{dataset_id}/metrics
new endpoint to get the dataset user metrics ([#2615])/api/v1/me/records/{record_id}/responses
new endpoint to create record user responses ([#2615])FeedbackDataset
in Python client (parent PR [#2615], and nested PRs: [#2949], [#2827], [#2943], [#2945], [#2962], and [#3003])ArgillaPeftTrainer
for text and token classification #2854
predict_proba()
method to ArgillaSetFitTrainer
ArgillaAutoTrainTrainer
for Text Classification #2664
database revisions
command showing database revisions info [#2615]: https://github.com/argilla-io/argilla/issues/2615
database migrate
command accepts a --revision
param to provide specific revision idtokens_length
metrics function returns empty data (#3045)token_length
metrics function returns empty data (#3045)mention_length
metrics function returns empty data (#3045)entity_density
metrics function returns empty data (#3045)tokens_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)token_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)mention_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)entity_density
metrics function has been deprecated and will be removed in 1.10.0 (#3045)density
, tokens_length
and chars_length
metrics from token classification metrics storage (#3045)char_start
, char_end
, tag
, and score
metrics from token classification metrics storage (#3045)Published by frascuchon over 1 year ago
Use your data in Argilla to fine-tune OpenAI models. You can do this by getting your data in the specific format through the prepare_for_training
method or train directly using ArgillaTrainer
.
Weβve added CLI support for Argilla Trainer and two new frameworks for training: OpenAI
& SpanMarker
.
Weβve improved the speed and robustness of rg.log
and rg.load
methods.
typer
CLIA more user-friendly command line interface with typer
that includes argument suggestions and colorful messages.
max_retries
and num_threads
parameters to rg.log
to run data logging request concurrently with backoff retry policy. See #2458 and #2533
rg.load
accepts include_vectors
and include_metrics
when loading data. Closes #2398
settings
param to prepare_for_training
(#2689)prepare_for_training
for openai
(#2658)ArgillaOpenAITrainer
(#2659)ArgillaSpanMarkerTrainer
for Named Entity Recognition (#2693)ArgillaTrainer
CLI support. Closes (#2809)quickstart.requirements.txt
. See #2666
id
is present. Closes #2535
click
to typer
CLI support. Closes (#2815)rg.log
computes all batches and raise an error for all failed batches.rg.log
is now 100.argilla.training
bugfixes and unification (#2665)ArgillaTrainer
.rg.log_async
function is deprecated and will be removed in next minor release.We've introduced two user roles to help you manage your annotation team: admin
and annotator
. admin
users can create, list and delete other users, workspaces and datasets. The annotator
role is specifically designed for users who focus solely on annotating datasets.
We've also added a page to see your user's settings in the Argilla UI. To access it click on your user avatar at the top right corner and then select My settings
.
The new Argilla.training
module deals with all data transformations and basic default configurations to train a model with annotations from Argilla using popular NLP frameworks. It currently supports spacy
, setfit
and transformers
.
Additionally, admin
users can access ready-made code snippets to copy-paste directly from the Argilla UI. Just go to the dataset you want to use, click the </> Train
button in the top banner and select your preferred framework.
Learn more about Argilla.training
in our docs.
Argilla will now create a default SQLite database to store users and workspaces. PostgreSQL is also officially supported. Simply set a custom value for the ARGILLA_DATABASE_URL
environment variable pointing to your PostgreSQL instance.
ARGILLA_HOME_PATH
new environment variable (#2564).ARGILLA_DATABASE_URL
new environment variable (#2564).admin
and annotator
(#2564).id
, first_name
, last_name
, role
, inserted_at
and updated_at
new user fields (#2564)./api/users
new endpoint to list and create users (#2564)./api/users/{user_id}
new endpoint to delete users (#2564)./api/workspaces
new endpoint to list and create workspaces (#2564)./api/workspaces/{workspace_id}/users
new endpoint to list workspace users (#2564)./api/workspaces/{workspace_id}/users/{user_id}
new endpoint to create and delete workspace users (#2564).argilla.tasks.users.migrate
new task to migrate users from old YAML file to database (#2564).argilla.tasks.users.create
new task to create a user (#2564).argilla.tasks.users.create_default
new task to create a user with default credentials (#2564).argilla.tasks.database.migrate
new task to execute database migrations (#2564).release.Dockerfile
and quickstart.Dockerfile
now creates a default argilladata
volume to persist data (#2564).Argilla.training
module with support for spacy
, setfit
, and transformers
. Closes #2504
prepare_for_training
method is working when multi_label=True
. Closes #2606
ARGILLA_USERS_DB_FILE
environment variable now it's only used to migrate users from YAML file to database (#2564).full_name
user field is now deprecated and first_name
and last_name
should be used instead (#2564).password
user field now requires a minimum of 8
and a maximum of 100
characters in size (#2564).quickstart.Dockerfile
image default users from team
and argilla
to admin
and annotator
including new passwords and API keys (#2564).admin
role (#2564).email
user field (#2564).disabled
user field (#2564).ARGILLA_LOCAL_AUTH_DEFAULT_APIKEY
and ARGILLA_LOCAL_AUTH_DEFAULT_PASSWORD
environment variables. Use python -m argilla.tasks.users.create_default
instead (#2564).API Key
and workspace
from python clientAPI Key
constant. Closes #2251