Grounded Language-Image Pre-training
MIT License
MASS: Masked Sequence to Sequence Pre-training for Language Generation
The implementation of DeBERTa
A Multi-Task Dataset for Simulated Humanoid Control
OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, ...
This repository contains resources for accessing the official benchmarks, codes, and checkpoints ...
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and la...
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
VideoX: a collection of video cross-modal models
Dedicated to building industrial foundation models for universal data intelligence across industr...
Bringing Old Photo Back to Life (CVPR 2020 oral)
Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabil...
Large-scale pretraining for dialogue
Set-of-Mark Prompting for GPT-4V and LMMs