MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

MIT License

Downloads

501

Stars

725

Committers

View Code on GitHub Visit Website View on X

Ecosystems: Playwright, Windows, VS Code Extension, Windows UI Library (WinUI), TypeScript

Commit Statistics

Past Year

All Time

Total Commits

Total Committers

Avg. Commits Per Committer

17.75

Bot Commits

Issue Statistics

Past Year

All Time

Total Pull Requests

Merged Pull Requests

Total Issues

Time to Close Issues

2 days

Package Rankings

Top 35.48% on Pypi.org

Related Projects

mttl

Building modular LMs with parameter-efficient fine-tuning.

11 Jul 2022 76

goodpoints

A Python package for generating concise, high-quality summaries of a probability distribution

03 Nov 2021 39

torchscale

Foundation Architecture for (M)LLMs

17 Nov 2022 3,006

VRL3

06 Nov 2022 32

Swin-Transformer

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using S...

25 Mar 2021 13,692

zero-shot-scfoundation

04 Oct 2023 46

promptbench

A unified evaluation framework for large language models

13 Jun 2023 2,407

BiDR

Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable...

28 Feb 2022 15

MEGAVERSE

Official Codebase for MEGAVERSE: (published in ACL: NAACL 2024)

04 Jun 2024 6

NeuralSpeech

04 Nov 2021 1,371

tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation

06 Aug 2021 716

evodiff

Generation of protein sequences and evolutionary alignments via discrete diffusion models

07 Jun 2022 487

DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

23 Mar 2022 1,856

aici

AICI: Prompts as (Wasm) Programs

26 Sep 2023 1,916

BioGPT

15 Aug 2022 4,292