Latte: Latent Diffusion Transformer for Video Generation.
APACHE-2.0 License
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
VILA - a multi-image visual language model with training, inference and evaluation recipe, deploy...
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by ...
Official repo for VGen: a holistic video generation ecosystem for video generation building on di...
Official PyTorch codes for "Enhancing Diffusion Models with Text-Encoder Reinforcement Learning",...
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
DeepFloyd-IF (Imagen Free)
An open platform for training, serving, and evaluating large language models. Release repo for Vi...
Official codes of DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多...
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video ...
[ECCV 2024 Oral] LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.
LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA a...