Michael Goin

LLM inference optimization and HPC Engineering @neuralmagic

Projects

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python - Released: 09 Feb 2023 - 28,039

deepsparse

Sparsity-aware deep learning inference runtime for CPUs

Python - Released: 14 Dec 2020 - 2,986

sparseml

Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

Python - Released: 11 Dec 2020 - 1,976

learned_indexes

Experiments on ideas proposed in Tim Kraska's "The Case for Learned Index Structures"

Python - Released: 05 Feb 2019 - 3