VMZ: Model Zoo for Video Modeling
APACHE-2.0 License
Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposit...
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tune...
Non-local Neural Networks for Video Classification
Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to...
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space...
VILA - a multi-image visual language model with training, inference and evaluation recipe, deploy...
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多...
A python library built to empower developers to build applications and systems with self-contain...
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模...
Gluon CV Toolkit
Deep Learning model Zoo
Codebase for Image Classification Research, written in PyTorch.
Official repo for VGen: a holistic video generation ecosystem for video generation building on di...
Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)