A new reliable, localizable, and generalizable metric for hallucination detection in image captioning models.
Statistics for this project are still being loaded, please check back later.
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tune...
a state-of-the-art-level open visual language model | 多模态预训练模型
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)
A multi-task model which does image captioning, sentence paraphrasing and cross-modal retrieval.
Using LLMs and pre-trained caption models for super-human performance on image captioning.
Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models
Code accompanying the paper Pretraining Language Models with Human Preferences
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities...
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + ...
I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch ...
CLAIR: A (surprisingly) simple semantic text metric with large language models.
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image capt...
[EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Languag...