Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
APACHE-2.0 License
Bot releases are hidden (Show)
Medusa is a easy-to-use framework that democratizes the acceleration techniques for LLM generation. Medusa-v0.1 uses several extra light-weighted decoding head, and exclude the need for draft model.