xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

OTHER License

Stars

View Code on GitHub

Ecosystems: Python

Issue Statistics

Past Year

All Time

Total Pull Requests

Merged Pull Requests

Total Issues

Time to Close Issues

about 1 hour

about 2 months

Related Projects

muse-maskgit-pytorch

Implementation of Muse: Text-to-Image Generation via Masked Generative Transformers, in Pytorch

03 Jan 2023 855

F-LMM

Code Release of F-LMM: Grounding Frozen Large Multimodal Models

28 Mar 2024 28

x-clip

A concise but complete implementation of CLIP with various experimental improvements from recent ...

01 Dec 2021 681

MODNet

A Trimap-Free Portrait Matting Solution in Real Time [AAAI 2022]

23 Nov 2020 3,786

OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities...

29 Jan 2022 2,401

InternImage

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable...

10 Nov 2022 2,486

CVPR2024-Papers-with-Code

CVPR 2024 论文和开源项目合集

26 Feb 2020 17,384

MetaTransformer

Meta-Transformer for Unified Multimodal Learning

08 Jul 2023 1,506

mm-cot

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tune...

02 Feb 2023 3,760

VisCPM

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模...

30 Jun 2023 1,075

phenaki-pytorch

Implementation of Phenaki Video, which uses Mask GIT to produce text guided videos of up to 2 min...

29 Sep 2022 746

caption-by-committee

Using LLMs and pre-trained caption models for super-human performance on image captioning.

14 Dec 2022 27

Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

23 Oct 2023 2,881

DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

05 Jan 2021 5,563

awesome-foundation-and-multimodal-models

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + ...

08 Oct 2023 518