argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

APACHE-2.0 License

Downloads
375.3K
Stars
3.7K
Committers
92

Bot releases are hidden (Show)

argilla - v1.29.1 Latest Release

Published by frascuchon 3 months ago

What's Changed

Full Changelog: https://github.com/argilla-io/argilla/compare/v1.29.0...v1.29.1

argilla - v2.0.0rc2

Published by frascuchon 4 months ago

What's Changed

Full Changelog: https://github.com/argilla-io/argilla/compare/v2.0.0rc1...v2.0.0rc2

argilla - v2.0.0rc1

Published by frascuchon 4 months ago

🔆 Release highlights

One Dataset to rule them all

The main difference between Argilla 1.x and Argilla 2.x is that we've converted the previous dataset types tailored for specific NLP tasks into a single highly-configurable Dataset class.

With the new Dataset you can combine multiple fields and question types, so you can adapt the UI for your specific project. This offers you more flexibility, while making Argilla easier to learn and maintain.

[!IMPORTANT]
If you want to continue using legacy datasets in Argilla 2.x, you will need to convert them into v2 Dataset's as explained in this migration guide. This includes: DatasetForTextClassificationDatasetForTokenClassification, and DatasetForText2Text.

FeedbackDataset's do not need to be converted as they are already compatible with the Argilla v2 format.

New SDK

We've redesigned our SDK with the idea to adapt it to the new single Dataset class and, most importantly, improve the user and developer experience.

The main goal of the new design is to make the SDK easier to use and learn, making the process to configure your dataset and get it up and running much simpler and faster.

To learn more about this new SDK, you can check:

New UI layout

We have also revamped our UI for Argilla 2.0:

  • We've redistributed the information in the Home page
  • Datasets don't have Tasks, but Questions.
  • Annotation guidelines and your progress are now accessible at all times within the dataset page.
  • Dataset pages also have a new flexible layout, so you can change the size of different panes and expand or collapse the guidelines and progress.
  • SpanQuestion's are now supported in the bulk view.

https://github.com/argilla-io/argilla/assets/126158523/f77e60de-5824-44ad-8b68-a087b223aa9d

New documentation

This new version of Argilla comes hand-in-hand with a revamped documentation: https://argilla-io.github.io/argilla/latest

We have applied the Diátaxis framework and UX principles with the hope to make this version cleaner and the information easier to find. Let us know what you think!

Share your thoughts with us!

[!NOTE]
This is a release candidate ahead of the official Argilla 2.0 release. Try it out and let us know what you think.
Find us in Discord or open a Github issue here.

What's Changed

Full Changelog: https://github.com/argilla-io/argilla/compare/v1.29.0...v2.0.0rc1

argilla - v1.29.0

Published by frascuchon 5 months ago

🔆 Release highlights

[!WARNING]
This will be the last release of Argilla v1. Starting from Argilla 2.0.0, we will only support FeedbackDatasets which will be renamed to Dataset. All other dataset types (DatasetForTextClassification, DatasetForTokenClassification, and DatasetForText2Text) will be deprecated. In the next release, we will provide more information and documentation on how to migrate all your datasets into Argilla 2.0 Datasets.

Improved record search

Your search matches are now highlighted so you can see easily the result of your search. We’ve also added a selector for datasets with more than one record fields so you can choose whether to do the search on All fields or a specific one.

https://github.com/argilla-io/argilla/assets/126158523/b9af3313-a5c3-46b6-83b7-6624662dba04

Record information and metadata in the UI

You can now check all the information and metadata associated for each record directly in the UI.

https://github.com/argilla-io/argilla/assets/126158523/4a3cc4e0-8be7-4927-8d80-8cf84a0dce8b

What's Changed in v1.29.0

New Contributors

Full Changelog: https://github.com/argilla-io/argilla/compare/v1.28.0...v1.29.0

argilla - v1.28.0

Published by jfcalvo 6 months ago

🔆 Release highlights

Improved suggestions

https://github.com/argilla-io/argilla/assets/126158523/380004e0-28cb-409f-b11c-71d0e3b6e8bf

Multiple scores support for MultiLabelQuestion and RankingQuestion

MultiLabelQuestion and RankingQuestion now take one score per suggested label / value, making the scores easier to interpret. Learn more about suggestions and their scores here.

[!WARNING]
If you upgrade to this version all previous scores in suggestions for MultiLabelQuestion, RankingQuestion and SpanQuestion will turn to NULL, as they will not be valid in the new schema. Please, make sure you upload scores again if you want to use them.

See scores next to its label / value

Scores are now shown next to its label / value in all questions. This makes them more visible and easier to interpret.

Suggestions first

Now you can order labels in MultiLabelQuestion so that suggestions are always shown first. This will help you make sure that the most relevant labels are always at hand. Plus, if you’ve added scores to your labels, these will be ordered in descending order. To enable this, go to the Dataset Settings page > Questions and enable “Suggestions first” for the desired question.

SpanQuestion improvements

https://github.com/argilla-io/argilla/assets/126158523/fad7b9ca-3890-45ed-acc8-5b038a81db06

Pre-selection highlight

We’ve improved the way selections are shown. You can now see a highlight that represents what the final selection will look like while you’re dragging your mouse. This will help you with the selection speed and show you the difference between the token vs character selection.

[!NOTE]
Remember that character-level spans are activated by holding Shift while doing the selection.

New label selector

We’ve improved the way the label selector works in the SpanQuestion when overlapping spans are enabled so it’s easier to add or correct labels. Simply click on the desired span to activate the selector and click on the label(s) that you want to add or remove.

Persistent storage warning

We’ve added a warning for Argilla instances deployed on Hugging Face Spaces to alert of data loss when the persistent storage is not enabled.

To learn more about this warning and how to disable it, go to our docs.

Changelog 1.28.0

Added

  • Added suggestion multi score attribute. (#4730)
  • Added order by suggestion first. (#4731)
  • Added multi selection entity dropdown for span annotation overlap. (#4735)
  • Added pre selection highlight for span annotation. (#4726)
  • Added banner when persistent storage is not enabled. (#4744)
  • Added support on Python SDK for new multi-label questions labels_order attribute. (#4757)

Changed

  • Changed the way how Hugging Face space and user is showed in sign in. (#4748)

Fixed

  • Fixed Korean character reversed. (#4753)

Fixed

  • Fixed requirements for version of wrapt library conflicting with Python 3.11 (#4693)

Full Changelog: https://github.com/argilla-io/argilla/compare/v1.27.0...v1.28.0

argilla - v1.28.0

Published by jfcalvo 6 months ago

🔆 Release highlights

Improved suggestions

https://github.com/argilla-io/argilla/assets/126158523/380004e0-28cb-409f-b11c-71d0e3b6e8bf

Multiple scores support for MultiLabelQuestion and RankingQuestion

MultiLabelQuestion and RankingQuestion now take one score per suggested label / value, making the scores easier to interpret. Learn more about suggestions and their scores here.

[!WARNING]
If you upgrade to this version all previous scores in suggestions for MultiLabelQuestion, RankingQuestion and SpanQuestion will turn to NULL, as they will not be valid in the new schema. Please, make sure you upload scores again if you want to use them.

See scores next to its label / value

Scores are now shown next to its label / value in all questions. This makes them more visible and easier to interpret.

Suggestions first

Now you can order labels in MultiLabelQuestion so that suggestions are always shown first. This will help you make sure that the most relevant labels are always at hand. Plus, if you’ve added scores to your labels, these will be ordered in descending order. To enable this, go to the Dataset Settings page > Questions and enable “Suggestions first” for the desired question.

SpanQuestion improvements

https://github.com/argilla-io/argilla/assets/126158523/fad7b9ca-3890-45ed-acc8-5b038a81db06

Pre-selection highlight

We’ve improved the way selections are shown. You can now see a highlight that represents what the final selection will look like while you’re dragging your mouse. This will help you with the selection speed and show you the difference between the token vs character selection.

[!NOTE]
Remember that character-level spans are activated by holding Shift while doing the selection.

New label selector

We’ve improved the way the label selector works in the SpanQuestion when overlapping spans are enabled so it’s easier to add or correct labels. Simply click on the desired span to activate the selector and click on the label(s) that you want to add or remove.

Persistent storage warning

We’ve added a warning for Argilla instances deployed on Hugging Face Spaces to alert of data loss when the persistent storage is not enabled.

To learn more about this warning and how to disable it, go to our docs.

Changelog 1.28.0

Added

  • Added suggestion multi score attribute. (#4730)
  • Added order by suggestion first. (#4731)
  • Added multi selection entity dropdown for span annotation overlap. (#4735)
  • Added pre selection highlight for span annotation. (#4726)
  • Added banner when persistent storage is not enabled. (#4744)
  • Added support on Python SDK for new multi-label questions labels_order attribute. (#4757)

Changed

  • Changed the way how Hugging Face space and user is showed in sign in. (#4748)

Fixed

  • Fixed Korean character reversed. (#4753)

Fixed

  • Fixed requirements for version of wrapt library conflicting with Python 3.11 (#4693)

Full Changelog: https://github.com/argilla-io/argilla/compare/v1.27.0...v1.28.0

argilla - v1.27.0

Published by damianpumar 6 months ago

🔆 Release highlights

Overlapping spans

We are finally releasing a much expected feature: overlapping spans. This allows you to draw more than one span over the same token(s)/character(s).

https://github.com/argilla-io/argilla/assets/126158523/3aeb6c6c-b348-4b3d-be67-483636c76293

To try them out, set up a SpanQuestion with the argument allow_overlap=True like this:

dataset = rg.FeedbackDataset(
    fields = [rg.TextField(name="text")]
    questions = [
        rg.SpanQuestion(
            name="spans",
            labels=["label1", "label2", "label3"],
            field="text"
        )
    ]
)

Learn more about configuring this and other question types here.

Global progress bars

We’ve included a new column in our home page that offers the global progress of your datasets, so that you can see at a glance what datasets are closer to completion.

These bars show progress by grouping records based on the status of their responses:

  • Submitted: Records where all responses have the submitted status.
  • Discarded: Records where all responses have the discarded status.
  • Conflicting: Records with at least one submitted and one discarded response.
  • Left: All other records that have no submitted or discarded responses. These may be in pending or draft .

Suggestions got a new look

We’ve improved the way suggestions are shown in the UI to make their purpose clearer: now you can identify each suggestion with a sparkle icon ✨ .

The behavior is still the same:

  • suggested values will appear pre-filled responses and marked with the sparkle icon.
  • make changes the the incorrect suggestions, then save as a draft or submit.
  • the icon will stay to mark the suggestions so you can compare the final response with the suggested one.

Increased label limits

We’ve increased the limit of labels you can use in Label, Multilabel and Span questions to 500. If you need to go beyond that number, you can set up a custom limit using the following environment variables:

  • ARGILLA_LABEL_SELECTION_OPTIONS_MAX_ITEMS to set the limits in label and multi label questions.
  • ARGILLA_SPAN_OPTIONS_MAX_ITEMS to set the limit in span questions.

[!WARNING]
The UI has been optimized to support up to 1000 labels. If you go beyond this limit, the UI may not be as responsive.

Learn more about this and other environment variables here.

Argilla auf Deutsch!

Thanks to our contributor @paulbauriegel you can now use Argilla fully in German! If that is the main language of your browser, there is nothing you need to do, the UI will automatically detect that and switch to German.

Would you like to translate Argilla to your own language? Reach out to us and we'll help you!

Changelog 1.27.0

Added

  • Added Allow overlap spans in the FeedbackDataset (#4668)
  • Added allow_overlapping parameter for span questions. (#4697)
  • Added overall progress bar on Datasets table (#4696)
  • Added German language translation (#4688)

Changed

  • New UI design for suggestions (#4682)

Fixed

  • Improve performance for more than 250 labels (#4702)

New Contributors

Full Changelog: https://github.com/argilla-io/argilla/compare/v1.26.1...v1.27.0

argilla - v1.26.1

Published by jfcalvo 7 months ago

1.26.1

Added

  • Added support for automatic detection of RTL languages. (#4686)

Full Changelog: https://github.com/argilla-io/argilla/compare/v1.26.0...v1.26.1

argilla - v1.26.0

Published by jfcalvo 7 months ago

🔆 Release highlights

Spans question

We've added a new type of question to Feedback Datasets: the SpanQuestion. This type of question allows you to highlight portions of text in a specific field and apply a label. It is specially useful for token classification (like NER or POS tagging) and information extraction tasks.

https://github.com/argilla-io/argilla/assets/126158523/d3821d49-6da0-4488-99e2-068d7411268a

With this type of question you can:

✨ Provide suggested spans with a confidence score, so your team doesn't need to start from scratch.

⌨️ Choose a label using your mouse or with the keyboard shortcut provided next to the label.

🖱️ Draw a span by dragging your mouse over the parts of the text you want to select or if it's a single token, just double-click on it.

🪄 Forget about mistakes with token boundaries. The UI will snap your spans to token boundaries for you.

🔎 Annotate at character-level when you need more fine-grained spans. Hold the Shift key while drawing the span and the resulting span will start and end in the exact boundaries of your selection.

✔️ Quickly change the label of a span by clicking on the label name and selecting the correct one from the dropdown.

🖍️ Correct a span at the speed of light by simply drawing the correct span over it. The new span will overwrite the old one.

🧼 Remove labels by hovering over the label name in the span and then click on the 𐢫 on the left hand side.

Here's an example of what your dataset would look like from the SDK:

import argilla as rg
from argilla.client.feedback.schemas import SpanValueSchema

#connect to your Argilla instance
rg.init(...)

# create a dataset with a span question
dataset = rg.FeedbackDataset(
    fields=[rg.TextField(name="text"),
    questions=[
        rg.SpanQuestion(
            name="entities",
            title="Highlight the entities in the text:",
            labels={"PER": "Person", "ORG": "Organization", "EVE": "Event"},  # or ["PER", "ORG", "EVE"]
            field="text", # the field where you want to do the span annotation
            required=True
        )
    ]
)

# create a record with suggested spans
record = rg.FeedbackRecord(
    fields={"text": "This is the text of the record"}
    suggestions = [
        {
            "question_name": "entities",
            "value": [
                SpanValueSchema(
                    start=0, # position of the first character of the span
                    end=10, # position of the character right after the end of the span
                    label="ORG",
                    score=1.0
                )
            ],
            "agent": "my_model",
        }
    ]
)

# add records to the dataset and push to Argilla
dataset.add_records([record])
dataset.push_to_argilla(...)

To learn more about this and all the other questions available in Feedback Datasets, check out our documentation on:

Changelog 1.26.0

Added

  • If you expand the labels of a single or multi label Question, the state is maintained during the entire annotation process. (#4630)
  • Added support for span questions in the Python SDK. (#4617)
  • Added support for span values in suggestions and responses. (#4623)
  • Added span questions for FeedbackDataset. (#4622)
  • Added ARGILLA_CACHE_DIR environment variable to configure the client cache directory. (#4509)

Fixed

  • Fixed contextualized workspaces. (#4665)
  • Fixed prepare for training when passing RankingValueSchema instances to suggestions. (#4628)
  • Fixed parsing ranking values in suggestions from HF datasets. (#4629)
  • Fixed reading description from API response payload. (#4632)
  • Fixed pulling (n*chunk_size)+1 records when using ds.pull or iterating over the dataset. (#4662)
  • Fixed client's resolution of enum values when calling the Search and Metrics api, to support Python >=3.11 enum handling. (#4672)

New Contributors

Full Changelog: https://github.com/argilla-io/argilla/compare/v1.25.0...v1.26.0

argilla - v1.25.0

Published by frascuchon 8 months ago

🔆 Release highlights

Reorder labels

admin and owner users can now change the order in which labels appear in the question form. To do this, go to the Questions tab inside Dataset Settings and move the labels until they are in the desired order.

https://github.com/argilla-io/argilla/assets/126158523/40f382a5-35c6-4bea-b15c-f001f539940d

Aligned SDK status filter

The missing status has been removed from the SDK filters. To filter records that don't have responses you will now need to use the pending status like so:

filtered_dataset = dataset.filter_by(response_status="pending")

Learn more about how to use this filter in our docs

Pandas 2.0 support

We’ve removed the limitation to use pandas <2.0.0 so you can now use Argilla with pandas v1 or v2 safely.

Changelog 1.25.0

[!NOTE]
For changes in the argilla-server module, visit the argilla-server release notes

Added

  • Reorder labels in dataset settings page for single/multi label questions (#4598)
  • Added pandas v2 support using the python SDK. (#4600)

Removed

  • Removed missing response for status filter. Use pending instead. (#4533)

Fixed

  • Fixed FloatMetadataProperty: value is not a valid float (#4570)
  • Fixed redirect to user-settings instead of 404 user_settings (#4609)

New Contributors

Full Changelog: https://github.com/argilla-io/argilla/compare/v1.24.0....v1.25.0

argilla - v1.24.0

Published by frascuchon 9 months ago

[!Note]
This release does not contain any new features, but it includes a major change in the argilla server.
The package is using the argilla-server dependency defined here.

Full Changelog: https://github.com/argilla-io/argilla/compare/v1.23.1...v1.24.0

argilla - v1.23.1

Published by frascuchon 9 months ago

1.23.1

Fixed

  • Fixed Responsive view for Feedback Datasets. (#4579)

New Contributors

Full Changelog: https://github.com/argilla-io/argilla/compare/v1.23.0...v1.23.1

argilla - v1.23.0

Published by jfcalvo 9 months ago

🔆 Release highlights

Hugging Face OAuth

You can now set up OAuth in your Argilla Hugging Face spaces. This is a simple way to have your team members or collaborators in crowdsourced projects sign in and log in to your space using their Hugging face accounts.

To learn how to set up Hugging Face OAuth for your Argilla Space, go to our docs.

Bulk actions for filter results

We’ve added an improvement for our bulk view so you can perform actions on all results from a filter (or a combination of them!).

To use this, go to the bulk view and apply some filter(s) of your choice. If the results are more than the records seen in the current page, when you click the checkbox you will see the option to select all of the results. Then, you can give responses, discard, save a draft and even submit all of the records at once!

Screenshot of the UI with the bulk selector for all filter results

Embed PDFs in a TextField

We’ve added the pdf_to_html function in our utilities so you can easily embed a PDF reader within a TextField using markdown.

This function accepts either the file path, the URLs or the file's byte data and returns the corresponding HTML to render the PDF within the Argilla user interface.

Learn more about how to use this feature here.

Changelog 1.23.0

Added

  • Added bulk annotation by filter criteria. (#4516)
  • Automatically fetch new datasets on focus tab. (#4514)
  • API v1 responses returning Record schema now always include dataset_id as attribute. (#4482)
  • API v1 responses returning Response schema now always include record_id as attribute. (#4482)
  • API v1 responses returning Question schema now always include dataset_id attribute. (#4487)
  • API v1 responses returning Field schema now always include dataset_id attribute. (#4488)
  • API v1 responses returning MetadataProperty schema now always include dataset_id attribute. (#4489)
  • API v1 responses returning VectorSettings schema now always include dataset_id attribute. (#4490)
  • Added pdf_to_html function to .html_utils module that convert PDFs to dataURL to be able to render them in tha Argilla UI. (#4481)
  • Added ARGILLA_AUTH_SECRET_KEY environment variable. (#4539)
  • Added ARGILLA_AUTH_ALGORITHM environment variable. (#4539)
  • Added ARGILLA_AUTH_TOKEN_EXPIRATION environment variable. (#4539)
  • Added ARGILLA_AUTH_OAUTH_CFG environment variable. (#4546)
  • Added OAuth2 support for HuggingFace Hub. (#4546)

Deprecated

  • Deprecated ARGILLA_LOCAL_AUTH_* environment variables. Will be removed in the release v1.25.0. (#4539)

Changed

  • Changed regex pattern for username attribute in UserCreate. Now uppercase letters are allowed. (#4544)

Removed

  • Remove sending Authorization header from python SDK requests. (#4535)

Fixed

  • Fixed keyboard shortcut for label questions. (#4530)

New Contributors

Full Changelog: https://github.com/argilla-io/argilla/compare/v1.22.0...v1.23.0

argilla - v1.22.0

Published by frascuchon 9 months ago

🔆 Release Highlights

Bulk actions in Feedback Task datasets

Our signature bulk actions are now available for Feedback datasets!

https://user-images.githubusercontent.com/126158523/297772506-97d83a54-ea3f-4700-acd6-ff9e349ade63.mp4

Switch between Focus and Bulk depending on your needs:

  • In the Focus view, you can navigate and respond to records individually. This is ideal for closely examining and giving responses to each record.
  • The Bulk view allows you to see multiple records on the same page. You can select all or some of them and perform actions in bulk, such as applying a label, saving responses, submitting, or discarding. You can use this feature along with filters and similarity search to process a list of records in bulk.

For now, this is only available in the Pending queue, but rest assured, bulk actions will be improved and extended to other queues in upcoming releases.

Read more about our Focus and Bulk views here.

Sorting rating values

We now support sorting records in the Argilla UI based on the values of Rating questions (both suggestions and responses):
Screenshot of the sorting by Rating question value options

Learn about this and other filters in our docs.

Out-of-the-box embedding support

It’s now easier than ever to add vector embeddings to your records with the new Sentence Transformers integration.

Just choose a model from the Hugging Face hub and use our SentenceTransformersExtractor to add vectors to your dataset:

import argilla as rg
from argilla.client.feedback.integrations.sentencetransformers import SentenceTransformersExtractor

# Connect to Argilla
rg.init(
    api_url="http://localhost:6900",
    api_key="owner.apikey",
    workspace="my_workspace"
)

# Initialize the SentenceTransformersExtractor
ste = SentenceTransformersExtractor(
    model = "TaylorAI/bge-micro-v2", # Use a model from https://huggingface.co/models?library=sentence-transformers
    show_progress = False,
)

# Load a dataset from your Argilla instance
ds_remote = rg.FeedbackDataset.from_argilla("my_dataset")

# Update the dataset
ste.update_dataset(
    dataset=ds_remote,
    fields=["context"], # Only update the context field
    update_records=True, # Update the records in the dataset
    overwrite=False, # Overwrite existing fields
)

Learn more about this functionality in this tutorial.

Changelog 1.22.0

Added

  • Added Bulk annotation support. (#4333)
  • Restore filters from feedback dataset settings. (#4461)
  • Warning on feedback dataset settings when leaving page with unsaved changes. (#4461)
  • Added pydantic v2 support using the python SDK. (#4459)
  • Added vector_settings to the __repr__ method of the FeedbackDataset and RemoteFeedbackDataset. (#4454)
  • Added integration for sentence-transformers using SentenceTransformersExtractor to configure vector_settings in FeedbackDataset and FeedbackRecord. (#4454)

Changed

  • Module argilla.cli.server definitions have been moved to argilla.server.cli module. (#4472)
  • [breaking] Changed vector_settings_by_name for generic property_by_name usage, which will return None instead of raising an error. (#4454)
  • The constant definition ES_INDEX_REGEX_PATTERN in module argilla._constants is now private. (#4472)
  • nan values in metadata properties will raise a 422 error when creating/updating records. (#4300)
  • None values are now allowed in metadata properties. (#4300)

Fixed

  • Paginating to a new record, automatically scrolls down to selected form area. (#4333)

Deprecated

  • The missing response status for filtering records is deprecated and will be removed in the release v1.24.0. Use pending instead. (#4433)

Removed

  • The deprecated python -m argilla database command has been removed. (#4472)

New Contributors

Full Changelog: https://github.com/argilla-io/argilla/compare/v1.21.0...v1.22.0

argilla - v1.21.0

Published by damianpumar 10 months ago

1.21.0

Added

  • Added new draft queue for annotation view (#4334)
  • Added annotation metrics module for the FeedbackDataset (argilla.client.feedback.metrics). (#4175).
  • Added strategy to handle and translate errors from the server for 401 HTTP status code` (#4362)
  • Added integration for textdescriptives using TextDescriptivesExtractor to configure metadata_properties in FeedbackDataset and FeedbackRecord. (#4400). Contributed by @m-newhauser
  • Added POST /api/v1/me/responses/bulk endpoint to create responses in bulk for current user. (#4380)
  • Added list support for term metadata properties. (Closes #4359)
  • Added new CLI task to reindex datasets and records into the search engine. (#4404)
  • Added httpx_extra_kwargs argument to rg.init and Argilla to allow passing extra arguments to httpx.Client used by Argilla. (#4440)

Changed

  • More productive and simpler shortcuts system (#4215)
  • Move ArgillaSingleton, init and active_client to a new module singleton. (#4347)
  • Updated argilla.load functions to also work with FeedbackDatasets. (#4347)
  • [breaking] Updated argilla.delete functions to also work with FeedbackDatasets. It now raises an error if the dataset does not exist. (#4347)
  • Updated argilla.list_datasets functions to also work with FeedbackDatasets. (#4347)

Fixed

  • Fixed error in TextClassificationSettings.from_dict method in which the label_schema created was a list of dict instead of a list of str. (#4347)
  • Fixed total records on pagination component (#4424)

Removed

  • Removed draft auto save for annotation view (#4334)
argilla - v1.20.0

Published by davidberenstein1957 11 months ago

🔆 Release highlights

Responses and suggestions filters

We’ve added new filters in the Argilla UI to filter records within Feedback datasets based on response values and suggestions information. It is also possible to sort records based on suggestion scores. This is available for questions of the type: LabelQuestion, MultiLabelQuestion and RatingQuestion.

Screenshot of the response and suggestions filters

Screenshot of the suggestion score sort

Utils module

Assign records

We added several methods to assign records to annotators via controlled overlap assign_records and assign_workspaces.

from argilla.client.feedback.utils import assign_records

assignments = assign_records(
    users=users,
    records=records,
    overlap=1,
    shuffle=True
)
from argilla.client.feedback.utils import assign_workspaces

assignments = assign_workspaces(
    assignments=assignments,
    workspace_type="individual"
)

for username, records in assignments.items():
    dataset = rg.FeedbackDataset(
        fields=fields, questions=questions, metadata=metadata,
        vector_settings=vector_settings, guidelines=guidelines
    )
    dataset.add_records(records)
    remote_dataset = dataset.push_to_argilla(name="my_dataset", workspace=username)

Multi-Modal DataURLs for images, video and audio

Argilla supports basic handling of video, audio, and images within markdown fields, provided they are formatted in HTML. To facilitate this, we offer three functions: video_to_html, audio_to_html, and image_to_html. Note that performance differs per browser and database configuration.

from argilla.client.feedback.utils import audio_to_html, image_to_html, video_to_html

# Configure the FeedbackDataset
ds_multi_modal = rg.FeedbackDataset(
    fields=[rg.TextField(name="content", use_markdown=True, required=True)],
    questions=[rg.TextQuestion(name="description", title="Describe the content of the media:", use_markdown=True, required=True)],
)
# Add the records
records = [
    rg.FeedbackRecord(fields={"content": video_to_html("/content/snapshot.mp4")}),
    rg.FeedbackRecord(fields={"content": audio_to_html("/content/sea.wav")}),
    rg.FeedbackRecord(fields={"content": image_to_html("/content/peacock.jpg")}),
]
ds_multi_modal.add_records(records)
# Push the dataset to Argilla
ds_multi_modal = ds_multi_modal.push_to_argilla("multi-modal-basic", workspace="admin")

image

Token Highlights

You can also add custom highlights to the text by using create_token_highlights and a custom color map.

from argilla.client.feedback.utils import create_token_highlights

tokens = ["This", "is", "a", "test"]
weights = [0.1, 0.2, 0.3, 0.4]
html = create_token_highlights(tokens, weights, c_map=custom_RGB) # 'viridis' by default

image

1.20.0 Changelog

Added

  • Added GET /api/v1/datasets/:dataset_id/records/search/suggestions/options endpoint to return suggestion available options for searching. (#4260)
  • Added metadata_properties to the __repr__ method of the FeedbackDataset and RemoteFeedbackDataset.(#4192).
  • Added get_model_kwargs, get_trainer_kwargs, get_trainer_model, get_trainer_tokenizer and get_trainer -methods to the ArgillaTrainer to improve interoperability across frameworks. (#4214).
  • Added additional formatting checks to the ArgillaTrainer to allow for better interoperability of defaults and formatting_func usage. (#4214).
  • Added a warning to the update_config-method of ArgillaTrainer to emphasize if the kwargs were updated correctly. (#4214).
  • Added argilla.client.feedback.utils module with html_utils (this mainly includes video/audio/image_to_html that convert media to dataURL to be able to render them in tha Argilla UI and create_token_highlights to highlight tokens in a custom way. Both work on TextQuestion and TextField with use_markdown=True) and assignments (this mainly includes assign_records to assign records according to a number of annotators and records, an overlap and the shuffle option; and assign_workspace to assign and create if needed a workspace according to the record assignment). (#4121)

Fixed

  • Fixed error in ArgillaTrainer, with numerical labels, using RatingQuestion instead of RankingQuestion (#4171)
  • Fixed error in ArgillaTrainer, now we can train for extractive_question_answering using a validation sample (#4204)
  • Fixed error in ArgillaTrainer, when training for sentence-similarity it didn't work with a list of values per record (#4211)
  • Fixed error in the unification strategy for RankingQuestion (#4295)
  • Fixed TextClassificationSettings.labels_schema order was not being preserved. Closes #3828 (#4332)
  • Fixed error when requesting non-existing API endpoints. Closes #4073 (#4325)
  • Fixed error when passing draft responses to create records endpoint. (#4354)

Changed

  • [breaking] Suggestions agent field only accepts now some specific characters and a limited length. (#4265)
  • [breaking] Suggestions score field only accepts now float values in the range 0 to 1. (#4266)
  • Updated POST /api/v1/dataset/:dataset_id/records/search endpoint to support optional query attribute. (#4327)
  • Updated POST /api/v1/dataset/:dataset_id/records/search endpoint to support filter and sort attributes. (#4327)
  • Updated POST /api/v1/me/datasets/:dataset_id/records/search endpoint to support optional query attribute. (#4270)
  • Updated POST /api/v1/me/datasets/:dataset_id/records/search endpoint to support filter and sort attributes. (#4270)
  • Changed the logging style while pulling and pushing FeedbackDataset to Argilla from tqdm style to rich. (#4267). Contributed by @zucchini-nlp.
  • Updated push_to_argilla to print repr of the pushed RemoteFeedbackDataset after push and changed show_progress to True by default. (#4223)
  • Changed models and tokenizer for the ArgillaTrainer to explicitly allow for changing them when needed. (#4214).
argilla - v1.19.0

Published by davidberenstein1957 11 months ago

🔆 Release highlights

🚨 Breaking changes

We have chosen to disable raining a ValueError during the FeedbackDataset.*_by_name(): FeedbackDataset.question_by_name(), FeedbackDataset.field_by_name() and FeedbackDataset.metadata_property_by_name. Instead, these methods will now return None when no match is found. This change is backwards compatible with previous versions of Argilla but might break your code if you are relying on the ValueError to be raised.

Similarity search

If you have included vectors and vector settings in your dataset, you can use the similarity search features within that dataset.

In the Argilla UI, you can find records that are similar to each other using the Find similar button at the top right corner of the record card. Here's how to do it:

image

In the SDK, you can do the same like this:

ds = rg.FeedbackDataset.from_argilla("my_dataset", workspace="my_workspace")

# using another record
similar_records =  ds.find_similar_records(
    vector_name="my_vector",
    record=ds.records[0],
    max_results=5
)

# work with the resulting tuples
for record, score in similar_records:
    ...

You can also find records that are similar to a given text, but bear in mind that the dimensions of the resulting vector should be equal to that of the vector used in the dataset records:

similar_records =  ds.find_similar_records(
    vector_name="my_vector",
    value=embedder_model.embeddings("My text is here")
    # value=embedder_model.embeddings("My text is here").tolist() # for numpy arrays
)

Add vectors to your FeedbackDataset

You can now add vectors to your Feedback dataset and records to enable similarity search.

To do that, first, you need to add vector settings to your dataset:

dataset = rg.FeedbackDataset(
    fields=[...],
    questions=[....],
    vector_settings=[
        rg.VectorSettings(
            name="my_vectors",
            dimensions=768,
            tite="My Vectors" #optional
        )
    ]
)

Then, you can add vectors to your records where the key matches the name of your vector settings and the value is a List[float]:

record = rg.FeedbackRecord(
    fields={...},
    vectors={"my_vectors": [...]}
)

⚠️ For vector search in OpenSearch, the filtering applied is using a post_filter step, since there is a bug that makes queries fail using filtering + KNN from Argilla.
See https://github.com/opensearch-project/k-NN/issues/1286


[TODO: Add a link to the docs]

FeedbackDataset

We added a show_progress argument to from_huggingface() method to make the progress bar for the parsing records process optional.

RemoteFeedbackDataset

We have added additional support for the pull()-method of RemoteFeedbackDataset. It is now possible to pull a RemoteFeedbackDataset with a specific max_records-argument. In combination with the earlier introduced filter_by and sorty_by this allows for more fine-grained control over the records that are pulled from Argilla.

ArgillaTrainer

The ArgillaTrainer class has been updated to support additional features. Hugging Face models can now be shared to the Hugging Face Hub directly from the ArgillaTrainer.push_to_huggingface-method. Additionally, we have included filter_by, sort_by, and max_records arguments to the `ArgillaTrainer '-initialisation-method to allow for more fine-grained control over the records used for training.

from argilla import SortBy

trainer = ArgillaTrainer(
    dataset=dataset,
    task=task,
    framework="setfit",
    filter_by={"response_status": ["submitted"]},
    sort_by=[SortBy(field="metadata.my-metadata", order="asc")],
    max_records=1000
)

🎨 UI improvements

  • We have changed the layout of the filters for a slimmer and more flexible component that will host more filter types in the future without being disruptive.
  • We have fixed a small UI bug where larger svg-images were pushed out of the visible screen, leading to a bad user experience.
  • There is sorting support based on inserted_at and updated_at datetime fields.

1.19.0 Changelog

Added

  • Added POST /api/v1/datasets/:dataset_id/records/search endpoint to search for records without user context, including responses by all users. (#4143)
  • Added POST /api/v1/datasets/:dataset_id/vectors-settings endpoint for creating vector settings for a dataset. (#3776)
  • Added GET /api/v1/datasets/:dataset_id/vectors-settings endpoint for listing the vectors settings for a dataset. (#3776)
  • Added DELETE /api/v1/vectors-settings/:vector_settings_id endpoint for deleting a vector settings. (#3776)
  • Added PATCH /api/v1/vectors-settings/:vector_settings_id endpoint for updating a vector settings. (#4092)
  • Added GET /api/v1/records/:record_id endpoint to get a specific record. (#4039)
  • Added support to include vectors for GET /api/v1/datasets/:dataset_id/records endpoint response using include query param. (#4063)
  • Added support to include vectors for GET /api/v1/me/datasets/:dataset_id/records endpoint response using include query param. (#4063)
  • Added support to include vectors for POST /api/v1/me/datasets/:dataset_id/records/search endpoint response using include query param. (#4063)
  • Added show_progress argument to from_huggingface() method to make the progress bar for parsing records process optional.(#4132).
  • Added a progress bar for parsing records process to from_huggingface() method with trange in tqdm.(#4132).
  • Added to sort by inserted_at or updated_at for datasets with no metadata. (4147)
  • Added max_records argument to pull() method for RemoteFeedbackDataset.(#4074)
  • Added functionality to push your models to the Hugging Face hub with ArgillaTrainer.push_to_huggingface (#3976). Contributed by @Racso-3141.
  • Added filter_by argument to ArgillaTrainer to filter by response_status (#4120).
  • Added sort_by argument to ArgillaTrainer to sort by metadata (#4120).
  • Added max_records argument to ArgillaTrainer to limit record used for training (#4120).
  • Added add_vector_settings method to local and remote FeedbackDataset. (#4055)
  • Added update_vectors_settings method to local and remote FeedbackDataset. (#4122)
  • Added delete_vectors_settings method to local and remote FeedbackDataset. (#4130)
  • Added vector_settings_by_name method to local and remote FeedbackDataset. (#4055)
  • Added find_similar_records method to local and remote FeedbackDataset. (#4023)
  • Added ARGILLA_SEARCH_ENGINE environment variable to configure the search engine to use. (#4019)

Changed

  • [breaking] Remove support for Elasticsearch < 8.5 and OpenSearch < 2.4. (#4173)
  • [breaking] Users working with OpenSearch engines must use version >=2.4 and set ARGILLA_SEARCH_ENGINE=opensearch. (#4019 and #4111)
  • [breaking] Changed FeedbackDataset.*_by_name() methods to return None when no match is found (#4101).
  • [breaking] limit query parameter for GET /api/v1/datasets/:dataset_id/records endpoint is now only accepting values greater or equal than 1 and less or equal than 1000. (#4143)
  • [breaking] limit query parameter for GET /api/v1/me/datasets/:dataset_id/records endpoint is now only accepting values greater or equal than 1 and less or equal than 1000. (#4143)
  • Update GET /api/v1/datasets/:dataset_id/records endpoint to fetch record using the search engine. (#4142)
  • Update GET /api/v1/me/datasets/:dataset_id/records endpoint to fetch record using the search engine. (#4142)
  • Update POST /api/v1/datasets/:dataset_id/records endpoint to allow to create records with vectors (#4022)
  • Update PATCH /api/v1/datasets/:dataset_id endpoint to allow updating allow_extra_metadata attribute. (#4112)
  • Update PATCH /api/v1/datasets/:dataset_id/records endpoint to allow to update records with vectors. (#4062)
  • Update PATCH /api/v1/records/:record_id endpoint to allow to update record with vectors. (#4062)
  • Update POST /api/v1/me/datasets/:dataset_id/records/search endpoint to allow to search records with vectors. (#4019)
  • Update BaseElasticAndOpenSearchEngine.index_records method to also index record vectors. (#4062)
  • Update FeedbackDataset.__init__ to allow passing a list of vector settings. (#4055)
  • Update FeedbackDataset.push_to_argilla to also push vector settings. (#4055)
  • Update FeedbackDatasetRecord to support the creation of records with vectors. (#4043)
  • Using cosine similarity to compute similarity between vectors. (#4124)

Fixed

  • Fixed svg images out of screen with too large images (#4047)
  • Fixed creating records with responses from multiple users. Closes #3746 and #3808 (#4142)
  • Fixed deleting or updating responses as an owner for annotators. (Commit 403a66d)
  • Fixed passing user_id when getting records by id. (Commit 98c7927)
  • Fixed non-basic tags serialized when pushing a dataset to the Hugging Face Hub. Closes #4089 (#4200)

Contributors

  • @Racso-3141 Added a progress bar for parsing records process to from_huggingface() method with trange in tqdm.(#4132).
argilla - v1.18.0

Published by frascuchon 12 months ago

🔆 Release highlights

💾 Add metadata properties to Feedback Datasets

You can now filter and sort records in Feedback Datasets in the UI and Python SDK using the metadata included in the records. To do that, you will first need to set up a MetadataProperty in your dataset:

# set up a dataset including metadata properties
dataset = rg.FeedbackDataset(
    fields=[
        rg.TextField(name="prompt"),
        rg.TextField(name="response"),
    ],
    questions=[
        rg.TextQuestion(name="question")
    ],
    metadata_properties=[
        rg.TermsMetadataProperty(name="source"),
        rg.IntegerMetadataProperty(name="response_length", title="Response length")
    ]
)

Learn more about how to define metadata properties or adding or deleting metadata properties in existing datasets.

This will read the metadata in the records that match the name of the metadata property. Any other metadata present in the record not matching a metadata property will be saved but not available to use in the filtering and sorting features in the UI or SDK.

# create a record with metadata
record = rg.FeedbackRecord(
    fields={
        "prompt": "Why can camels survive long without water?",
        "response": "Camels use the fat in their humps to keep them filled with energy and hydration for long periods of time."
    },
    metadata={"source": "wikipedia", "response_length": 105, "my_hidden_metadata": "hidden metadata"}
)

Learn more about how to create records with metadata and how to add, modify or delete metadata from existing records.

🗃️ Filter and sort records using metadata in Feedback Datasets

In the Python SDK, you can filter and sort records based on the Metadata Properties that you set up for your dataset. You can combine multiple filters and sorts. Here is an example of how you could use them:

filtered_records = remote.filter_by(
    metadata_filters=[
        rg.IntegerMetadataFilter(
            name="response_length",
            ge=500, # optional: greater or equal to
            le=1000 # optional: lower or equal to
        ),
        rg.TermsMetadataFilter(
            name="source", 
            values=["wikipedia", "wikihow"]
        )
    ]
).sort_by(
    [
        rg.SortBy(
            field="response_length",
            order="desc" # for descending or "asc" for ascending
        )
    ]

In the UI, simply use the Metadata and Sort components to filter and sort records like this:

https://github.com/argilla-io/argilla/assets/126158523/6a5a7984-425d-4f1a-b0f7-7cc2bb7e4a0a

Read more about filtering and sorting in Feedback Datasets.

⚠️ Breaking change using SQLite as backend in a docker deployment

From version 1.17.0 a new argilla os user is configured for the provided docker images. If you are using the docker deployment and you want to upload to this version from versions older than v1.17.0 (If you already updated from v1.17.0 this step was already applied - see Release Notes), you should change permissions to the SQLite db file, before upgrading the version. You can do it with the following action:

docker exec --user root <argilla_server_container_id> /bin/bash -c 'chmod -R 777 "$ARGILLA_HOME_PATH"'

Note: You can find the docker container id by running:

docker ps  | grep -i argilla-server
713973693fb7   argilla/argilla-server:v1.16.0                "/bin/bash start_arg…"   11 hours ago   Up 7 minutes       0.0.0.0:6900->6900/tcp                           docker-argilla-1

Once the version is upgraded, we recommend to provided proper security access to this folder by setting the user and group to the new argilla user:

docker exec --user root <argilla_server_container_id>  /bin/bash -c 'chown -R argilla:argilla "$ARGILLA_HOME_PATH"'

1.18.0 Changelog

Added

  • New GET /api/v1/datasets/:dataset_id/metadata-properties endpoint for listing dataset metadata properties. (#3813)
  • New POST /api/v1/datasets/:dataset_id/metadata-properties endpoint for creating dataset metadata properties. (#3813)
  • New PATCH /api/v1/metadata-properties/:metadata_property_id endpoint allowing the update of a specific metadata property. (#3952)
  • New DELETE /api/v1/metadata-properties/:metadata_property_id endpoint for deletion of a specific metadata property. (#3911)
  • New GET /api/v1/metadata-properties/:metadata_property_id/metrics endpoint to compute metrics for a specific metadata property. (#3856)
  • New PATCH /api/v1/records/:record_id endpoint to update a record. (#3920)
  • New PATCH /api/v1/dataset/:dataset_id/records endpoint to bulk update the records of a dataset. (#3934)
  • Missing validations to PATCH /api/v1/questions/:question_id. Now title and description are using the same validations used to create questions. (#3967)
  • Added TermsMetadataProperty, IntegerMetadataProperty and FloatMetadataProperty classes allowing to define metadata properties for a FeedbackDataset. (#3818)
  • Added metadata_filters to filter_by method in RemoteFeedbackDataset to filter based on metadata i.e. TermsMetadataFilter, IntegerMetadataFilter, and FloatMetadataFilter. (#3834)
  • Added a validation layer for both metadata_properties and metadata_filters in their schemas and as part of the add_records and filter_by methods, respectively. (#3860)
  • Added sort_by query parameter to listing records endpoints that allows to sort the records by inserted_at, updated_at or metadata property. (#3843)
  • Added add_metadata_property method to both FeedbackDataset and RemoteFeedbackDataset (i.e. FeedbackDataset in Argilla). (#3900)
  • Added fields inserted_at and updated_at in RemoteResponseSchema. (#3822)
  • Added support for sort_by for RemoteFeedbackDataset i.e. a FeedbackDataset uploaded to Argilla. (#3925)
  • Added metadata_properties support for both push_to_huggingface and from_huggingface. (#3947)
  • Add support for update records (metadata) from Python SDK. (#3946)
  • Added delete_metadata_properties method to delete metadata properties. (#3932)
  • Added update_metadata_properties method to update metadata_properties. (#3961)
  • Added automatic model card generation through ArgillaTrainer.save (#3857)
  • Added FeedbackDataset TaskTemplateMixin for pre-defined task templates. (#3969)
  • A maximum limit of 50 on the number of options a ranking question can accept. (#3975)
  • New last_activity_at field to FeedbackDataset exposing when the last activity for the associated dataset occurs. (#3992)

Changed

  • GET /api/v1/datasets/{dataset_id}/records, GET /api/v1/me/datasets/{dataset_id}/records and POST /api/v1/me/datasets/{dataset_id}/records/search endpoints to return the total number of records. (#3848, #3903)
  • Implemented __len__ method for filtered datasets to return the number of records matching the provided filters. (#3916)
  • Increase the default max result window for Elasticsearch created for Feedback datasets. (#3929)
  • Force elastic index refresh after records creation. (#3929)
  • Validate metadata fields for filtering and sorting in the Python SDK. (#3993)
  • Using metadata property name instead of id for indexing data in search engine index. (#3994)

Fixed

  • Fixed response schemas to allow values to be None i.e. when a record is discarded the response.values are set to None. (#3926)
  • New Contributors

Full Changelog: https://github.com/argilla-io/argilla/compare/v1.17.0...v1.18.0

argilla - v1.17.0

Published by frascuchon about 1 year ago

☀️ Highlights

This release comes with a lot of new goodies and quality improvements. We added model card support for the ArgillaTrainer, worked on the FeedbackDataset task templates and added timestamps to responses. We also fixed a lot of bugs and improved the overall quality of the codebase. Enjoy!

🚨 Breaking change in updating existing Hugging Face Spaces deployments

The quickstart image startup script was changed from from /start_quickstart.sh to /home/argilla/start_quickstart.sh, which might cause existing Hugging Face Spaces deployments to malfunction. A fix was added for the Argilla template space via this PR. Alternatively, you can just create a new deployment.

⚠️ Breaking change using SQLite as backend in a docker deployment

From version 1.17.0 a new argilla os user is configured for the provided docker images. If you are using the docker deployment and you want to upload to this version, you should do some actions once update your container and before working with Argilla. Execute the following command:

docker exec --user root <argilla_server_container_id> /bin/bash -c 'chown -R argilla:argilla "$ARGILLA_HOME_PATH"'

This will change the permissions on the argilla home path, which allows it to work with new containers.

Note: You can find the docker container id by running:

docker ps  | grep -i argilla-server
713973693fb7   argilla/argilla-server:v1.17.0                "/bin/bash start_arg…"   11 hours ago   Up 7 minutes       0.0.0.0:6900->6900/tcp                           docker-argilla-1

💾 ArgillaTrainer Model Card Generation

The ArgillaTrainer now supports automatic model card generation. This means that you can now generate a model card with all the required info for Hugging Face and directly share these models to the hub, as you would expect within the Hugging Face ecosystem. See the docs for more info.

model_card_kwargs = {
    "language": ["en", "es"],
    "license": "Apache-2.0",
    "model_id": "all-MiniLM-L6-v2",
    "dataset_name": "argilla/emotion",
    "tags": ["nlp", "few-shot-learning", "argilla", "setfit"],
    "model_summary": "Small summary of what the model does",
    "model_description": "An extended explanation of the model",
    "model_type": "A 1.3B parameter embedding model fine-tuned on an awesome dataset",
    "finetuned_from": "all-MiniLM-L6-v2",
    "repo": "https://github.com/..."
    "developers": "",
    "shared_by": "",
}

trainer = ArgillaTrainer(
    dataset=dataset,
    task=task,
    framework="setfit",
    framework_kwargs={"model_card_kwargs": model_card_kwargs}
)
trainer.train(output_dir="my_model")
# or get the card as `str` by calling the `generate_model_card` method
argilla_model_card = trainer.generate_model_card("my_model")

🦮 FeedbackDataset Task Templates

The Argilla FeedbackDataset now supports a number of task templates that can be used to quickly create a dataset for specific tasks out of the box. This should help starting users get right into the action without having to worry about the dataset structure. We support basic tasks like Text Classification but also allow you to setup complex RAG-pipelines. See the docs for more info.

import argilla as rg

ds = rg.FeedbackDataset.for_text_classification(
    labels=["positive", "negative"],
    multi_label=False,
    use_markdown=True,
    guidelines=None,
)
ds
# FeedbackDataset(
#   fields=[TextField(name="text", use_markdown=True)],
#   questions=[LabelQuestion(name="label", labels=["positive", "negative"])]
#   guidelines="<Guidelines for the task>",
# )

⏱️ inserted_at and updated_at are added to responses

What are responses without timestamps? The RemoteResponseSchema now supports inserted_at and updated_at fields. This should help you to keep track of the time when a response was created and updated. Perfectly, for keeping track of annotator performance within your company.

1.17.0

Added

  • Added fields inserted_at and updated_at in RemoteResponseSchema (#3822).
  • Added automatic model card generation through ArgillaTrainer.save (#3857).
  • Added task templates to the FeedbackDataset (#3973).

Changed

  • Updated Dockerfile to use multi stage build (#3221 and #3793).
  • Updated active learning for text classification notebooks to use the most recent small-text version (#3831).
  • Changed argilla dataset name in the active learning for text classification notebooks to be consistent with the default names in the huggingface spaces (#3831).
  • FeedbackDataset API methods have been aligned to be accessible through the several implementations (#3937).
  • The unify_responses support for remote datasets (#3937).

Fixed

  • Fix field not shown in the order defined in the dataset settings. Closes #3959 (#3984)
  • Updated active learning for text classification notebooks to pass ids of type int to TextClassificationRecord (#3831).
  • Fixed record fields validation that was preventing from logging records with optional fields (i.e. required=True) when the field value was None (#3846).
  • Always set pretrained_model_name_or_path attribute as string in ArgillaTrainer (#3914).
  • The inserted_at and updated_at attributes are create using the utcnow factory to avoid unexpected race conditions on timestamp creation (#3945)
  • Fixed configure_dataset_settings when providing the workspace via the arg workspace (#3887).
  • Fixed saving of models trained with ArgillaTrainer with a peft_config parameter (#3795).
  • Fixed backwards compatibility on from_huggingface when loading a FeedbackDataset from the Hugging Face Hub that was previously dumped using another version of Argilla, starting at 1.8.0, when it was first introduced (#3829).
  • Fixed TrainingTaskForQuestionAnswering.__repr__ (#3969)
  • Fixed potential dictionary key-errors in TrainingTask.prepare_for_training_with_*-methods (#3969)

Deprecated

  • Function rg.configure_dataset is deprecated in favour of rg.configure_dataset_settings. The former will be removed in version 1.19.0

New Contributors

Full Changelog: https://github.com/argilla-io/argilla/compare/v1.16.0...v1.17.0

argilla - v1.16.0

Published by gabrielmbmb about 1 year ago

☀️ Highlights

This release comes with an auto save feature for the UI, an enhanced Argilla CLI app, new keyboard shortcuts for the annotation process in the Feedback Dataset and new integrations for the ArgillaTrainer.

💾 Auto save

Argilla UI Feedback Record getting auto saved

Have you been writing a long corrected text in a TextField for a completion given by an LLM and you have refreshed the page before submitting it? Well, since this release you are covered! The Argilla UI will save every few seconds the responses given in the annotation form of a FeedbackDataset. Annotators can partially annotate one record and then come back to finish the annotation process without losing the previous work.

👨🏻‍💻 More operations directly from the Argilla CLI

Argilla CLI displaying help information

The Argilla CLI has been updated to include an extensive list of new commands, from users and datasets management to training models all from the terminal!

⌨️ New keyboard shorcuts for the Feedback Dataset

Feedback dataset shortcuts

Now, you can seamlessly navigate within the feedback form using just your keyboard. We've extended the functionality of these shortcuts to cover all types of available questions: Label, Multi-label, Ranking, Rating and Text

QnA, Chat Completion with OpenAI and Sentence Transformers model training now in the ArgillaTrainer

The ArgillaTrainer doesn't stop getting new features and improvements!

1.16.0

Added

  • Added ArgillaTrainer integration with sentence-transformers, allowing fine tuning for sentence similarity (#3739)
  • Added ArgillaTrainer integration with TrainingTask.for_question_answering (#3740)
  • Added Auto save record to save automatically the current record that you are working on (#3541)
  • Added ArgillaTrainer integration with OpenAI, allowing fine tuning for chat completion (#3615)
  • Added workspaces list command to list Argilla workspaces (#3594).
  • Added datasets list command to list Argilla datasets (#3658).
  • Added users create command to create users (#3667).
  • Added whoami command to get current user (#3673).
  • Added users delete command to delete users (#3671).
  • Added users list command to list users (#3688).
  • Added workspaces delete-user command to remove a user from a workspace (#3699).
  • Added datasets list command to list Argilla datasets (#3658).
  • Added users create command to create users (#3667).
  • Added users delete command to delete users (#3671).
  • Added workspaces create command to create an Argilla workspace (#3676).
  • Added datasets push-to-hub command to push a FeedbackDataset from Argilla into the HuggingFace Hub (#3685).
  • Added info command to get info about the used Argilla client and server (#3707).
  • Added datasets delete command to delete a FeedbackDataset from Argilla (#3703).
  • Added created_at and updated_at properties to RemoteFeedbackDataset and FilteredRemoteFeedbackDataset (#3709).
  • Added handling PermissionError when executing a command with a logged in user with not enough permissions (#3717).
  • Added workspaces add-user command to add a user to workspace (#3712).
  • Added workspace_id param to GET /api/v1/me/datasets endpoint (#3727).
  • Added workspace_id arg to list_datasets in the Python SDK (#3727).
  • Added argilla script that allows to execute Argilla CLI using the argilla command (#3730).
  • Added server_info function to check the Argilla server information (also accessible via rg.server_info) (#3772).

Changed

  • Move database commands under server group of commands (#3710)
  • server commands only included in the CLI app when server extra requirements are installed (#3710).
  • Updated PUT /api/v1/responses/{response_id} to replace values stored with received values in request (#3711).
  • Display a UserWarning when the user_id in Workspace.add_user and Workspace.delete_user is the ID of an user with the owner role as they don't require explicit permissions (#3716).
  • Rename tasks sub-package to cli (#3723).
  • Changed argilla database command in the CLI to now be accessed via argilla server database, to be deprecated in the upcoming release (#3754).
  • Changed visible_options (of label and multi label selection questions) validation in the backend to check that the provided value is greater or equal than/to 3 and less or equal than/to the number of provided options (#3773).

Fixed

  • Fixed remove user modification in text component on clear answers (#3775)
  • Fixed Highlight raw text field in dataset feedback task (#3731)
  • Fixed Field title too long (#3734)
  • Fixed error messages when deleting a DatasetForTextClassification (#3652)
  • Fixed Pending queue pagination problems when during data annotation (#3677)
  • Fixed visible_labels default value to be 20 just when visible_labels not provided and len(labels) > 20, otherwise it will either be the provided visible_labels value or None, for LabelQuestion and MultiLabelQuestion (#3702).
  • Fixed DatasetCard generation when RemoteFeedbackDataset contains suggestions (#3718).
  • Add missing draft status in ResponseSchema as now there can be responses with draft status when annotating via the UI (#3749).
  • Searches when queried words are distributed along the record fields (#3759).
  • Fixed Python 3.11 compatibility issue with /api/datasets endpoints due to the TaskType enum replacement in the endpoint URL (#3769).

As always, thanks to our amazing contributors

Full Changelog: https://github.com/argilla-io/argilla/compare/v1.15.1...v1.16.0

Package Rankings
Top 1.37% on Pypi.org