Code for our WOAH@ACL 2021 Paper on Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in One Unified Format
MIT License
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These an...
搜索所有中文NLP数据集,附常用英文NLP数据集
A tool that locates, downloads, and extracts machine translation corpora
Repository to track the progress in Natural Language Processing (NLP), including the datasets and...
Basically SentEval with German language downstream tasks
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model trainin...
Code accompanying the paper Pretraining Language Models with Human Preferences
Organized Resources for Deep Learning in Natural Language Processing
Public release of the TransCoder research project https://arxiv.org/pdf/2006.03511.pdf
🧪Yet Another ICU Benchmark: a holistic framework for the standardization of clinical prediction m...
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + ...
General Assembly's 2015 Data Science course in Washington, DC
中文对话0.2B小模型(ChatLM-Chinese-0.2B),开源所有数据集来源、数据清洗、tokenizer训练、模型预训练、SFT指令微调、RLHF优化等流程的全部代码。支持下游任务sf...
Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL challenge@NeurIPS.