[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
APACHE-2.0 License
Mamba state-space model
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tune...
Code Release of F-LMM: Grounding Frozen Large Multimodal Models
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多...
CVPR 2024 论文和开源项目合集
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2....
a state-of-the-art-level open visual language model | 多模态预训练模型
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模...
VILA - a multi-image visual language model with training, inference and evaluation recipe, deploy...
[ECCV 2024] Official repository of "GenView: Enhancing View Quality with Pretrained Generative Mo...
EfficientViT is a new family of vision models for efficient high-resolution vision.
Official repository of ”Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-...
sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data
MambaOut: Do We Really Need Mamba for Vision?