Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模...
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities...
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large...
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
A PyTorch implementation of "Real-time Scene Text Detection with Differentiable Binarization".
Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language M...
Code Release of F-LMM: Grounding Frozen Large Multimodal Models
[CVPR 2024 Highlight] Official PyTorch implementation of CoDeF: Content Deformation Fields for Te...
PyTorch codes for "Iterative Token Evaluation and Refinement for Real-World Super-Resolution", AA...
Zero-1-to-3: Zero-shot One Image to 3D Object (ICCV 2023)
Official codes of DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior
a state-of-the-art-level open visual language model | 多模态预训练模型
The OCR approach is rephrased as Segmentation Transformer: https://arxiv.org/abs/1909.11065. This...
LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA a...