This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.
OTHER License
GRadient-INformed MoE
Dedicated to building industrial foundation models for universal data intelligence across industr...
CodeBERT
Large-scale pretraining for dialogue
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
A framework for standardizing evaluations of large foundation models, beyond single-score reporti...
A Multi-Task Dataset for Simulated Humanoid Control
MASS: Masked Sequence to Sequence Pre-training for Language Generation
NOTSOFAR-1 Challenge: Distant Diarization and ASR
Automatic Generation of Visualizations and Infographics using Large Language Models
This repository contains resources for accessing the official benchmarks, codes, and checkpoints ...
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing