Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

APACHE-2.0 License

Downloads
96
Stars
2.2K

Bot releases are visible (Hide)

Medusa - Medusa-v0.1 Latest Release

Published by harveyp123 about 1 year ago

Medusa is a easy-to-use framework that democratizes the acceleration techniques for LLM generation. Medusa-v0.1 uses several extra light-weighted decoding head, and exclude the need for draft model.