donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

MIT License

Downloads

3.6K

Stars

5.7K

Committers

View Code on GitHub Visit Website

Ecosystems: Python

Commit Statistics

Past Year

All Time

Total Commits

Total Committers

Avg. Commits Per Committer

6.86

Bot Commits

Issue Statistics

Past Year

All Time

Total Pull Requests

Merged Pull Requests

Total Issues

233

Time to Close Issues

about 1 month

25 days

Package Rankings

Top 6.29% on Pypi.org

Badges

Extracted from project README

Related Projects

yolov7_d2

🔥🔥🔥🔥 (Earlier YOLOv7 not official one) YOLO with Transformers and Instance Segmentation, with Ten...

23 Jun 2021 3,125

IDM-VTON

IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild

20 Mar 2024 2,513

LongLoRA

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

21 Sep 2023 2,607

CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

29 May 2022 7,818

cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

17 Jun 2024 1,703

coyo-dataset

COYO-700M: Large-scale Image-Text Pair Dataset

30 Aug 2022 1,080

PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

12 Oct 2023 2,138

stylegan-t

[ICML'23] StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

20 Jan 2023 1,146

CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

18 Sep 2023 5,913

prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

02 Mar 2023 1,295

CLAP

Contrastive Language-Audio Pretraining

06 Mar 2022 1,358

ckiptagger

CKIP Neural Chinese Word Segmentation, POS Tagging, and NER

23 Aug 2019 1,632

Marigold

[CVPR 2024] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

27 Nov 2023 1,590

voxceleb_trainer

In defence of metric learning for speaker recognition

26 Mar 2020 1,031

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

26 Mar 2024 2,774