VPTQ, A Flexible and Extreme low-bit quantization algorithm
MIT License
Statistics for this project are still being loaded, please check back later.
Tutel MoE: An Optimized Mixture-of-Experts Implementation
AICI: Prompts as (Wasm) Programs
A Python package for generating concise, high-quality summaries of a probability distribution
[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse c...
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using S...
Subseasonal forecasting models
Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable...
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documen...
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Foundation Architecture for (M)LLMs
A JavaScript toolkit for Natural Language-based Visualization Authoring
A unified evaluation framework for large language models