Mixture-of-Experts for Large Vision-Language Models
APACHE-2.0 License
Bot releases are hidden (Show)
google/siglip-so400m-patch14-384
as the vision encoder for a more detailed visual understanding.capacity_factor
to 1.5 to support stronger MoE-LLaVA.We hope that community researchers can pay attention to the fact that large vision-language models can also be sparsified and even perform better.