mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
APACHE-2.0 License
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多...
LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA a...
VILA - a multi-image visual language model with training, inference and evaluation recipe, deploy...
Code Release of F-LMM: Grounding Frozen Large Multimodal Models
OpenMMLab Foundational Library for Training Deep Learning Models
ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab
【大模型】3小时完全从0训练一个仅有26M的小参数GPT,最低仅需2G显卡即可推理训练!
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + ...
a state-of-the-art-level open visual language model | 多模态预训练模型
A Comprehensive Toolkit for High-Quality PDF Content Extraction
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large...
Mixture-of-Experts for Large Vision-Language Models
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模...
CVPR 2024 论文和开源项目合集