MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising
OTHER License
MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation
本项目使用了EcapaTdnn、ResNetSE、ERes2Net、CAM++等多种先进的声纹识别模型,同时本项目也支持了MelSpectrogram、Spectrogram、MFCC、Fban...
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多...
Official repo for VGen: a holistic video generation ecosystem for video generation building on di...
CVPR 2024 论文和开源项目合集
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模...
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
The collection of pre-trained, state-of-the-art AI models for ailia SDK
VILA - a multi-image visual language model with training, inference and evaluation recipe, deploy...
T2I-Adapter
Character Animation (AnimateAnyone, Face Reenactment)
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning