toxic-comment-collection

Code for our WOAH@ACL 2021 Paper on Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in One Unified Format

MIT License

Downloads

Stars

Committers

View Code on GitHub View on X

Ecosystems: Python

Commit Statistics

Past Year

All Time

Total Commits

102

Total Committers

Avg. Commits Per Committer

14.57

Bot Commits

Issue Statistics

Past Year

All Time

Total Pull Requests

Merged Pull Requests

Total Issues

Time to Close Issues

N/A

4 months

Package Rankings

Top 18.49% on Pypi.org

Related Projects

entity-recognition-datasets

A collection of corpora for named entity recognition (NER) and entity recognition tasks. These an...

01 Sep 2018 1,495

CLUEDatasetSearch

搜索所有中文NLP数据集，附常用英文NLP数据集

21 Feb 2020 4,106

mtdata

A tool that locates, downloads, and extracts machine translation corpora

06 Apr 2020 139

NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and...

22 Jun 2018 22,559

sentence-embedding-evaluation-german

Basically SentEval with German language downstream tasks

08 Apr 2022 0

TextAttack

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model trainin...

15 Oct 2019 2,920

OffensEval2020-code

03 May 2020 12

pretraining-with-human-feedback

Code accompanying the paper Pretraining Language Models with Human Preferences

20 Feb 2023 167

Deep-Learning-NLP

Organized Resources for Deep Learning in Natural Language Processing

17 Sep 2017 432

TransCoder

Public release of the TransCoder research project https://arxiv.org/pdf/2006.03511.pdf

10 Jul 2020 1,688

YAIB

🧪Yet Another ICU Benchmark: a holistic framework for the standardization of clinical prediction m...

15 Aug 2022 49

awesome-foundation-and-multimodal-models

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + ...

08 Oct 2023 518

DAT8

General Assembly's 2015 Data Science course in Washington, DC

07 Aug 2015 1,606

ChatLM-mini-Chinese

中文对话0.2B小模型（ChatLM-Chinese-0.2B），开源所有数据集来源、数据清洗、tokenizer训练、模型预训练、SFT指令微调、RLHF优化等流程的全部代码。支持下游任务sf...

27 Aug 2023 1,166

AutoDL

Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL challenge@NeurIPS.

02 Apr 2020 1,131