Woosuk Kwon

CS PhD student at UC Berkeley

Ecosystems: PyTorch, Llama, Cuda, Python

Projects

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python - Released: 09 Feb 2023 - 28,039

retraining-free-pruning

[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers

Python - Released: 31 Oct 2021 - 160