Align 3D Point Cloud with Multi-modalities for Large Language Models
MIT License
Statistics for this project are still being loaded, please check back later.
a state-of-the-art-level open visual language model | 多模态预训练模型
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2....
[ICCV2023 oral] Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstru...
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
ImageBind One Embedding Space to Bind Them All
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模...
🔥RandLA-Net in Tensorflow (CVPR 2020, Oral & IEEE TPAMI 2021)
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
The Most Faithful Implementation of Segment Anything (SAM) in 3D
Code Release of F-LMM: Grounding Frozen Large Multimodal Models
Mixture-of-Experts for Large Vision-Language Models
VILA - a multi-image visual language model with training, inference and evaluation recipe, deploy...
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large...