MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

APACHE-2.0 License

Stars

1.9K

View Code on GitHub Visit Website

Ecosystems: Python

Bot releases are hidden (Show)

MoE-LLaVA - Release v1.0.0 Latest Release

Published by LinB203 9 months ago

Supported higher resolution input using google/siglip-so400m-patch14-384 as the vision encoder for a more detailed visual understanding.
Changed capacity_factor to 1.5 to support stronger MoE-LLaVA.
Added the results of MME benchmark and evaluation pipeline.
Improved docs.
Fixed typos.

We hope that community researchers can pay attention to the fact that large vision-language models can also be sparsified and even perform better.

Package Rankings

Top 6.75% on Proxy.golang.org

Badges

Extracted from project README

Replicate demo and cloud API

GitHub issues

GitHub closed issues

Replicate demo and cloud API

Related Projects

VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deploy...

23 Feb 2024 1,061

MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2....

15 Apr 2023 25,326

FlagEmbedding

Retrieval and Retrieval-augmented LLMs

02 Aug 2023 6,718

OpenMoE

A family of open-sourced Mixture-of-Experts (MoE) Large Language Models

08 Aug 2023 1,368

GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

15 May 2024 4,747

cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

17 Jun 2024 1,703

LLMZoo

⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language mod...

01 Apr 2023 2,927

minimind

【大模型】3小时完全从0训练一个仅有26M的小参数GPT，最低仅需2G显卡即可推理训练！

27 Jul 2024 2,087

PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

12 Oct 2023 2,138

long_llama

LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA a...

06 Jul 2023 1,448

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多...

22 Nov 2023 5,641

Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

23 Oct 2023 2,881

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

26 Mar 2024 2,774

MiniCPM

MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.

29 Jan 2024 5,590

CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

18 Sep 2023 5,913