Building modular LMs with parameter-efficient fine-tuning.
MIT License
MSCCL++: A GPU-driven communication stack for scalable AI applications
Trace, the New AutoDiff for AI Systems and LLM Agents
AICI: Prompts as (Wasm) Programs
Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable...
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Tutel MoE: An Optimized Mixture-of-Experts Implementation
Community for applying LLMs to robotics and a robot simulator with ChatGPT integration
Official Codebase for MEGAVERSE: (published in ACL: NAACL 2024)
[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse c...
Library to convert natural language utterance into a structured domain specific language
Shared Middle-Layer for Triton Compilation