CS PhD student at UC Berkeley
A high-throughput and memory-efficient inference and serving engine for LLMs
Python - Released: 09 Feb 2023 - 28,039
[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
Python - Released: 31 Oct 2021 - 160