MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

MIT License

Downloads
501
Stars
725
Committers
4

Commit Statistics

Past Year

All Time

Total Commits
71
71
Total Committers
4
4
Avg. Commits Per Committer
17.75
17.75
Bot Commits
1
1

Issue Statistics

Past Year

All Time

Total Pull Requests
38
38
Merged Pull Requests
36
36
Total Issues
91
91
Time to Close Issues
2 days
2 days